Closed bradfitz closed 4 years ago
It's worth considering whether embedglob
should support a complete file tree, perhaps using the **
syntax supported by some Unix shells.
Some people would need the ability to serve the embedded assets with HTTP using the http.FileServer
.
I personally use either mjibson/esc (which does that) or in some cases my own file embedding implementation which renames files to create unique paths and adds a map from the original paths to the new ones, e.g. "/js/bootstrap.min.js": "/js/bootstrap.min.827ccb0eea8a706c4c34a16891f84e7b.js"
. Then you can use this map in the templates like this: href="{{ static_path "/css/bootstrap.min.css" }}"
.
I think a consequence of this would be that it would be nontrivial to figure out what files are necessary to build a program.
The //go:embed
approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.
(Just musing out loud here.)
@opennota,
would need the ability to serve the embedded assets with HTTP using the
http.FileServer
.
Yes, the first link above is a package I wrote (in 2011, before Go 1) and still use, and it supports using http.FileServer: https://godoc.org/perkeep.org/pkg/fileembed#Files.Open
@cespare,
The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.
Yes, good point. That's a very strong argument for using a package. It also makes it more readable & documentable, since we can document it all with regular godoc, rather than deep in cmd/go's docs.
@bradfitz - Do you want to close this https://github.com/golang/go/issues/3035 ?
@agnivade, thanks for finding that! I thought I remembered that but couldn't find it. Let's leave it open for now and see what others think.
If we go with the magic package, we could use the unexported type trick to ensure that callers pass compile-time constants as arguments: https://play.golang.org/p/RtHlKjhXcda.
(This is the strategy referenced here: https://groups.google.com/forum/#!topic/golang-nuts/RDA9Hag8RZw/discussion)
One concern I have is how would it hanle invividual or all assets being too big to fit into memory and whether there would be maybe a build tag or per file access option to choose between pritorizing access time vs memory footprint or some middle ground implementation.
the way i've solved that problem (because of course i also have my own implementation :) ) is to provide an http.FileSystem implementation that serves all embedded assets. That way, you don't to rely on magic comments in order to appease the typechecker, the assets can easily be served by http, a fallback implementation can be provided for development purposes (http.Dir) without changing the code, and the final implementation is quite versatile, as http.FileSystem covers quite a bit, not only in reading files, but listing directories as well.
One can still use magic comments or whatever to specify what needs to be embedded, though its probably easier to specify all the globs via a plain text file.
@AlexRouSg This proposal would only be for files which are appropriate to include directly in the final executable. It would not be appropriate to use this for files that are too big to fit in memory. There's no reason to complicate this tool to handle that case; for that case, just don't use this tool.
@ianlancetaylor, I think the distinction @AlexRouSg was making was between having the files provided as global []byte
s (unpageable, potentially writable memory) vs providing a read-only, on-demand view of an ELF section that can normally live on disk (in the executable), like via an Open
call that returns an *io.SectionReader
. (I don't want to bake in http.File
or http.FileSystem
into cmd/go or runtime... net/http can provide an adapter.)
@bradfitz both http.File itself is an interface with no technical dependencies to the http
package. It might be a good idea for any Open
method to provide an implementation that conforms to that interface, because both the Stat
and Readdir
methods are quite useful for such assets
@urandom, it couldn't implement http.FileSystem, though, without referring to the "http.File" name (https://play.golang.org/p/-r3KjG1Gp-8).
@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.
I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.
A couple thoughts:
//go:embed
comments are used) or a specific subdirectory (if static
is used). This makes it a lot easier to understand the relationship between packages and embedded files.Either way, this blocks embedding /etc/shadow
or .git
. Neither can be included in a module zip.
In general, I'm worried about expanding the scope of the go command too much. However, the fact that there are so many solutions to this problem means there probably ought to be one official solution.
I'm familiar with go_embed_data
and go-bindata
(of which there are several forks), and this seems to cover those use cases. Are there any important problems the others solve that this doesn't cover?
Blocking certain files shouldn't be too hard, especially if you use a static
or embed
directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.
I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed
top-level declaration actually built into the language. It would work similarly to import
, but would only support local paths and would require a name for it to be assigned to. For example,
embed ui "./ui/build"
func main() {
file, err := ui.Open("version.txt")
if err != nil {
panic(err)
}
version, err = ioutil.ReadAll(file)
if err != nil {
panic(err)
}
file.Close()
log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}
Edit: You beat me to it, @jayconrod.
To expand on https://github.com/golang/go/issues/35950#issuecomment-561703346, there is a puzzle about the exposed API. The obvious ways to expose the data are []byte
, string
, and Read
-ish interfaces.
The typical case is that you want the embedded data to be immutable. However, all interfaces exposing []byte
(which includes io.Reader
, io.SectionReader
, etc.) must either (1) make a copy, (2) allow mutability, or (3) be immutable despite being a []byte
. Exposing the data as string
s solves that, but at the cost of an API that will often end up requiring copying anyway, since lots of code that consumes embedded files eventually requires byte slices one way or another.
I'd suggest route (3): be immutable despite being a []byte
. You can enforce this cheaply by using a readonly symbol for the backing array. This also lets you safely expose the same data as a []byte
and a string
; attempts to mutate the data will fail. The compiler can't take advantage of the immutability, but that's not too great of a loss. This is something that toolchain support can bring to the table that (as far as I know) none of the existing codegen packages do.
(A third party codegen package could do this by generating a generic assembly file containing DATA
symbols that are marked as readonly, and then short arch-specific assembly files exposing those symbols in the form of string
s and []byte
s. I wrote CL 163747 specifically with this use case in mind, but never got around to integrating it into any codegen packages.)
I'm unsure what you're talking about in terms of immutability. io.Reader
already enforces immutability. That's the entire point. When you call Read(buf)
, it copies data into the buffer that you provided. Changing buf
after that has zero effect on the internals of the io.Reader
.
I agree with @DeedleFake. I don't want to play games with magic []byte
array backings. It's okay to copy from the binary into user-provided buffers.
Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between \n and \r\n. (We can argue whether this is a dumb thing in DTrace or not -- that's beside the point and it is the situation today.)
It's super useful that backticked strings treat both as \n regardless of how they appear in source, and I rely on this with a go-generate to embed the DTrace.
So if there is an embed file added to the go command, I would gently suggest that options to change the handling of CR/CRLF might come in very handy, particularly for folks who might be developing on different systems where the default line endings can be a gotcha.
Like with compression, I'd really like to stop at "copy the file bytes into the binary". CR/CRLF normalization, Unicode normalization, gofmt'ing, all that belongs elsewhere. Check in the files containing the exact bytes you want. (If your version control can't leave them alone, maybe check in gzipped content and gunzip them at runtime.) There are many file munging knobs we could imagine adding. Let's stop at 0.
It may be too late to introduce a new reserved directory name, as much as I'd like to. (It wasn't too late back in 2014, but it's probably too late now.) So some kind of opt-in comment may be necessary.
Suppose we define a type runtime.Files. Then you could imagine writing:
//go:embed *.html (or static/* etc)
var files runtime.Files
And then at runtime you just call files.Open to get back an interface { io.ReadSeeker; io.ReaderAt }
with the data. Note that the var is unexported, so one package can't go around grubbing in another package's embedded files.
Names TBD but as far as the mechanism it seems like that should be enough, and I don't see how to make it simpler. (Simplifications welcome of course!)
Whatever we do, it needs to be possible to support with Bazel and Gazelle too. That would mean having Gazelle recognize the comment and write out a Bazel rule saying the globs, and then we'd need to expose a tool (go tool embedgen or whatever) to generate the extra file to include in the build (the go command would do this automatically and never actually show the extra file). That seems straightforward enough.
If various munging won't do the trick, then that's an argument against using this new facility. It's not a stopper for me -- I can use go generate like I've been doing, but it means I cannot benefit from the new feature.
With respect to munging in general -- I can imagine a solution where someone provides an implementation of an interface (something like a Reader() on one side, and something to receive the file on the other -- maybe instantianted with an io.Reader from the file itself) -- which the go cmd would build and run to prefilter the file before embedding. Then folks can provide whatever filter they want. I imagine some folks would provide quasi-standard filters like a dos2unix implementation, compression, etc. (Maybe they should be chainable even.)
I guess there'd have to be an assumption that whatever the embedded processor is, it must be compilable on ~every build system, as go would be building a temporary native tool for this purpose.
It may be too late to introduce a new reserved directory name, as much as I'd like to. [...] some kind of opt-in comment may be necessary.
If the files are only accessible through a special package, say runtime/embed
, then importing that package could be the opt-in signal.
The io.Read
approach seems like it could add significant overhead (in terms of both copying and memory footprint) for conceptually-simple linear operations like strings.Contains
(such as in cmd/go/internal/cfg
) or, critically, template.Parse
.
For those use-cases, it seems ideal to allow the caller to choose whether to treat the whole blob as a (presumably memory-mapped) string
or an io.ReaderAt
.
That seems compatible with the general runtime.Files
approach, though: the thing returned from runtime.Files.Open
could have a ReadString() string
method that returns the memory-mapped representation.
some kind of opt-in comment may be necessary.
We could do that with the go
version in the go.mod
file. Before 1.15
(or whatever) the static
subdirectory would contain a package, and at 1.15
or higher it would contain embedded assets.
(That doesn't really help in GOPATH
mode, though.)
I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes.
While i appreciate the drive for simplicity, we should also make sure we're meeting users' needs.
12 out of 14 of the tools listed at https://tech.townsourced.com/post/embedding-static-files-in-go/#comparison support compression, which suggests that it is a pretty common requirement.
It's true that one could do the compression as a pre-build step outside go, but that would still require 1) a tool to do the compression 2) checking some kind of assets.zip
blob into vcs 3) probably a utility library around the embed api to undo the compression. At which point it is unclear what the benefit is at all.
Three of the goals listed in the initial proposal were:
If we read the second of these as "don't require a separate tool for embedding" then not supporting compressed files directly or indirectly fails to meet all three of these goals.
Does this need to be package level? Module level seems a better granularity since most likely one module = one project.
Since this directory wouldn't contain Go codeβ could it be something like _static
?
β or, if it is, it would be treated as arbitrary bytes whose name happens to end in ".go" instead of as Go code to be compiled
If it's one special directory, the logic could just be slurp up anything and everything in that directory tree. The magic embed package could let you do something like embed.Open("img/logo.svg")
to open a file in a subdirectory of the asset tree.
Strings seem good enough. They can easily be copied into []byte
or converted into a Reader
. Code generation or libraries could be used to provide fancier APIs and handle things during init
. That could include decompression or creating an http.FileSystem
.
Doesn't Windows have a special format for embedding assets. Should that be used when building a Windows executable? If so, does that have any implications for the kinds of operations that can be provided?
Don't forget gitfs π
Is there a reason it couldn't be part of go build / link... e.g. go build -embed example=./path/example.txt
and some package that exposes access to it (e.g. embed.File("example")
, instead of using go:embed
?
you need a stub for that in your code though
@egonelbre the problem with go build -embed
is that all users would need to use it properly. This needs to be fully transparent and automatic; existing go install
or go get
commands can't stop doing the right thing.
@bradfitz I would recommend https://github.com/markbates/pkger over Packr. It uses the standard library API for working with files.
func run() error {
f, err := pkger.Open("/public/index.html")
if err != nil {
return err
}
defer f.Close()
info, err := f.Stat()
if err != nil {
return err
}
fmt.Println("Name: ", info.Name())
fmt.Println("Size: ", info.Size())
fmt.Println("Mode: ", info.Mode())
fmt.Println("ModTime: ", info.ModTime())
if _, err := io.Copy(os.Stdout, f); err != nil {
return err
}
return nil
}
Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.
mjibson/esc does this as well, and it is a big quality-of-life improvement when developing a webapp; you not only save linking time but also avoid having to restart the application, which can take substantial time and/or require repeating extra steps to test your changes, depending on the implementation of the webapp.
Problems with the current situation:
- Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.
Goals:
- don't check in generated files
Well, this part is easily solvable by just adding the generated files to the .gitignore
file or equivalent. I always did that...
So, alternatively Go could just have its own "official" embed tool that runs by default on go build
and ask people to ignore these files as a convention. That would be the less magic solution available (and backward compatible with existing Go versions).
I'm just brainstorming / thinking aloud here... but I actually like the proposed idea in overall. π
Also, since //go:generate
directives don't run automatically on go build
the behavior of go build
may seem a bit inconsistent: //go:embed
will work automatically but for //go:generate
you have to run go generate
manually. (//go:generate
can already break the go get
flow if it generates .go
files needed for the build).
//go:generate
can already break the go get flow if it generates.go
files needed for the build
I think the usual flow for that, and the one that I've generally used, although it took a bit of getting used to, is to use go generate
entirely as a development-end tool and just commit the files that it generates.
@bradfitz it doesn't need to implement http.FileSystem
itself. If the implementation provides a type that implements http.File
, then it would be trivial for anyone, including the stdlib http package to provide a wrapper around the Open
function, converting the type to http.File
in order to conform to http.FileSystem
@andreynering //go:generate
and //go:embed
are very different, though. This mechanism can happen seamlessly at build time because it won't run arbitrary code. I believe that makes it similar to how cgo can generate code as part of go build
.
I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes.
While i appreciate the drive for simplicity, we should also make sure we're meeting users' needs.
12 out of 14 of the tools listed at https://tech.townsourced.com/post/embedding-static-files-in-go/#comparison support compression, which suggests that it is a pretty common requirement.
I'm not sure I agree with this reasoning.
The compression done by the other libraries is different from adding it to this proposal in that they will not reduce performance on subsequent builds since the alternatives are generally speaking generated ahead of build rather than during build time.
Low build times is a clear added value with Go over other languages and compression trades CPU time for a reduced storage/transfer footprint. If a lot of Go packages starts running compressions on go build
we're going to add even more build time than the time added by simply copying assets during builds. I'm skeptical of adding compression because of others doing it. As long as the initial design doesn't by design prevent a future extension which adds support for i.e. compression, putting it in there because it might be something that could benefit some seems like unnecessary hedging.
It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB β which is great, but also not a clear dealbreaker for the functionality for most applications I can think of. Especially not if most of the "heavier" assets are files such as JPEGs or PNGs which are already pretty well compressed.
What about keeping compression out for now and adding it in if it's actually missed by a lot of people? (and can be done without undue costs)
To add to @sakjur's comment above: compression seems orthogonal to me. I generally want to compress an entire binary or release archive, and not just the assets. Particularly when Go binaries in Go can easily get into the tens of megabytes without any assets.
@mvdan I guess one of my concerns is that quite often when I've seen embedding is together with some other pre-processing: minification, typescript compilation, data compression, image crunching, image resizing, sprite-sheets. The only exception being websites that only use html/template
. So, in the end, you might end up using some sort of "Makefile" anyways or uploading the pre-processed content. In that sense, I would think a command-line flag would work nicer with other tools than comments.
I guess one of my concerns is that quite often when I've seen embedding is together with some other pre-processing: minification, typescript compilation, data compression, image crunching, image resizing, sprite-sheets. The only exception being websites that only use html/template.
Thanks, that's a useful data point. Perhaps the need for compression is not as common as it looked. If that's the case, i agree that it makes sense to leave it out.
It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB β which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.
Binary size is a big deal for many go developers (https://github.com/golang/go/issues/6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (https://github.com/golang/go/issues/11799, https://github.com/golang/go/issues/26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).
That doesn't really help in GOPATH mode, though
Maybe, if you're in GOPATH mode, this feature simply doesn't apply since I imagine the Go team doesn't plan on doing feature parity for GOPATH forever? There are already features that are not supported in GOPATH (such as security w/ checksum db, downloading dependencies through a proxy server, and semantic import versioning)
As @bcmills mentioned, having the static directory name in a go.mod file is a great way of introducing this feature in Go 1.15 since the feature can be automatically turned off in go.mod files that have a <=go1.14 clause.
That said, this also means users have to manually write what the static directory path is.
I think the vendor directory and the _test.go conventions are great examples of how they made working with Go and those two features a lot easier.
I don't recall many people requesting the option to customize the vendor directory name or having the ability to change the _test.go
convention to something else. But if Go never introduce the _test.go feature, then testing in Go would look a lot different today.
Therefore, maybe a name less generic than static
gives better chances of non-collision and so having a conventional directory (similar to vendor and _test.go) could be a better user experience compared to magical comments.
Examples of potentially low-collision names:
_embed
- follows the _test.go
convention go_binary_assets
.gobin
follows the .git convention runtime_files
- so that it matches the runtime.Files
struct Lastly, the vendor
directory was added in Go 1.5 . Sooo, maybe it's not that bad to add a new convention now? π
I think it should expose a mmap-readonly []byte
. Just raw access to pages from the executable, paged in by the OS as needed. Everything else can be provided on top of that, with just bytes.NewReader
.
If this is for some reason unacceptable, please provide ReaderAt
not just ReadSeeker
; the latter is trivial to construct from the former, but the other way isn't as good: it would need a mutex to guard the single offset, and ruin performance.
It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB β which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.
Binary size is a big deal for many go developers (#6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (#11799, #26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).
That is definitely a fair point and I can see how my argument can be seen as an argument in favor of carelessness with regards to filesizes. That was not my intention. My point is more in line with shipping this feature without compression which would still be useful for some, and they could provide useful feedback and insights as to how to properly add compression in a way that feels right long-term. The assets might swell in a way that the debug info is unlikely to do and it's easier for developers of packages which are installed/imported by others to reduce build performance needlessly if the implementation makes it easy to do so.
Another option would be to make compression of assets a build-flag and leave the compromise between build size and time to the builder rather than the developer. That would move the decision closer to the end-user of the binary who could make a decision on whether the compression is worthwhile. Otoh, this would risk creating an increased surface area for differences between development and production, so it isn't a clear cut better method than anything else and it's not something I feel like I'd want to advocate for.
My current asset embedding tool loads content from the asset files when built with -tags dev
. Some convention like that would probably be useful here too; it shortens the development cycle significantly when e.g. fiddling with HTML or a template.
If not, the caller will have to wrap this lower-level mechanism with some *_dev.go
and *_nodev.go
wrappers and implement non-embedded loading for the dev
scenario. Not even hard, but that road will just lead to a similar explosion of tools that the first comment on this issue describes. Those tools will have to do less than today, but they'll still multiply.
I think -tags dev
failing to work when run outside the Go module would be reasonable (can't figure out where to load the assets from).
There are many tools to embed static asset files into binaries:
Actually, https://tech.townsourced.com/post/embedding-static-files-in-go/ lists more:
Proposal
I think it's time to do this well once & reduce duplication, adding official support for embedding file resources into the cmd/go tool.
Problems with the current situation:
go install
-able or making people write their own Makefiles, etc.Goals:
go install
/go build
do the embedding automaticallyfunc() io.Reader
,io.ReaderAt
, etc)io.Reader
)? (edit: but probably not; see comments below)go build
orgo install
can not run arbitrary code, just likego:generate
doesn't run automatically at install time.The two main implementation approaches are
//go:embed Logo logo.jpg
or a well-known package (var Logo = embed.File("logo.jpg")
).go:embed approach
For a
go:embed
approach, one might say that anygo/build
-selected*.go
file can contain something like:Which, say, compiles to:
(adding a dependency to the
io
package)Or:
compiling to, say:
Obviously this isn't fully fleshed out. There'd need to be something for compressed files too that yield only an
io.Reader
.embed package approach
The other high-level approach is to not have a magic
//go:embed
syntax and instead just let users write Go code in some new"embed"
or"golang.org/x/foo/embed"
package:Then have cmd/go recognize the calls to embed.Foo("foo/*.js") etc and glob do the work in cmd/go, rather than at runtime. Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.
Concerns
../../../../../../../../../../etc/shadow
.git
too