golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.25k stars 17.7k forks source link

proposal: cmd/go: support embedding static assets (files) in binaries #35950

Closed bradfitz closed 4 years ago

bradfitz commented 4 years ago

There are many tools to embed static asset files into binaries:

Actually, https://tech.townsourced.com/post/embedding-static-files-in-go/ lists more:

Proposal

I think it's time to do this well once & reduce duplication, adding official support for embedding file resources into the cmd/go tool.

Problems with the current situation:

Goals:

The two main implementation approaches are //go:embed Logo logo.jpg or a well-known package (var Logo = embed.File("logo.jpg")).

go:embed approach

For a go:embed approach, one might say that any go/build-selected *.go file can contain something like:

//go:embed Logo logo.jpg

Which, say, compiles to:

func Logo() *io.SectionReader

(adding a dependency to the io package)

Or:

//go:embedglob Assets assets/*.css assets/*.js

compiling to, say:

var Assets interface{
     Files() []string
     Open func(name string) *io.SectionReader
} = runtime.EmbedAsset(123)

Obviously this isn't fully fleshed out. There'd need to be something for compressed files too that yield only an io.Reader.

embed package approach

The other high-level approach is to not have a magic //go:embed syntax and instead just let users write Go code in some new "embed" or "golang.org/x/foo/embed" package:

var Static = embed.Dir("static")
var Logo = embed.File("images/logo.jpg")
var Words = embed.CompressedReader("dict/words")

Then have cmd/go recognize the calls to embed.Foo("foo/*.js") etc and glob do the work in cmd/go, rather than at runtime. Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

Concerns

ianlancetaylor commented 4 years ago

It's worth considering whether embedglob should support a complete file tree, perhaps using the ** syntax supported by some Unix shells.

ghost commented 4 years ago

Some people would need the ability to serve the embedded assets with HTTP using the http.FileServer.

I personally use either mjibson/esc (which does that) or in some cases my own file embedding implementation which renames files to create unique paths and adds a map from the original paths to the new ones, e.g. "/js/bootstrap.min.js": "/js/bootstrap.min.827ccb0eea8a706c4c34a16891f84e7b.js". Then you can use this map in the templates like this: href="{{ static_path "/css/bootstrap.min.css" }}".

cespare commented 4 years ago

I think a consequence of this would be that it would be nontrivial to figure out what files are necessary to build a program.

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

(Just musing out loud here.)

bradfitz commented 4 years ago

@opennota,

would need the ability to serve the embedded assets with HTTP using the http.FileServer.

Yes, the first link above is a package I wrote (in 2011, before Go 1) and still use, and it supports using http.FileServer: https://godoc.org/perkeep.org/pkg/fileembed#Files.Open

bradfitz commented 4 years ago

@cespare,

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

Yes, good point. That's a very strong argument for using a package. It also makes it more readable & documentable, since we can document it all with regular godoc, rather than deep in cmd/go's docs.

agnivade commented 4 years ago

@bradfitz - Do you want to close this https://github.com/golang/go/issues/3035 ?

bradfitz commented 4 years ago

@agnivade, thanks for finding that! I thought I remembered that but couldn't find it. Let's leave it open for now and see what others think.

balasanjay commented 4 years ago

If we go with the magic package, we could use the unexported type trick to ensure that callers pass compile-time constants as arguments: https://play.golang.org/p/RtHlKjhXcda.

(This is the strategy referenced here: https://groups.google.com/forum/#!topic/golang-nuts/RDA9Hag8RZw/discussion)

AlexRouSg commented 4 years ago

One concern I have is how would it hanle invividual or all assets being too big to fit into memory and whether there would be maybe a build tag or per file access option to choose between pritorizing access time vs memory footprint or some middle ground implementation.

urandom commented 4 years ago

the way i've solved that problem (because of course i also have my own implementation :) ) is to provide an http.FileSystem implementation that serves all embedded assets. That way, you don't to rely on magic comments in order to appease the typechecker, the assets can easily be served by http, a fallback implementation can be provided for development purposes (http.Dir) without changing the code, and the final implementation is quite versatile, as http.FileSystem covers quite a bit, not only in reading files, but listing directories as well.

One can still use magic comments or whatever to specify what needs to be embedded, though its probably easier to specify all the globs via a plain text file.

ianlancetaylor commented 4 years ago

@AlexRouSg This proposal would only be for files which are appropriate to include directly in the final executable. It would not be appropriate to use this for files that are too big to fit in memory. There's no reason to complicate this tool to handle that case; for that case, just don't use this tool.

bradfitz commented 4 years ago

@ianlancetaylor, I think the distinction @AlexRouSg was making was between having the files provided as global []bytes (unpageable, potentially writable memory) vs providing a read-only, on-demand view of an ELF section that can normally live on disk (in the executable), like via an Open call that returns an *io.SectionReader. (I don't want to bake in http.File or http.FileSystem into cmd/go or runtime... net/http can provide an adapter.)

urandom commented 4 years ago

@bradfitz both http.File itself is an interface with no technical dependencies to the http package. It might be a good idea for any Open method to provide an implementation that conforms to that interface, because both the Stat and Readdir methods are quite useful for such assets

bradfitz commented 4 years ago

@urandom, it couldn't implement http.FileSystem, though, without referring to the "http.File" name (https://play.golang.org/p/-r3KjG1Gp-8).

rsc commented 4 years ago

@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.

jayconrod commented 4 years ago

A couple thoughts:

Either way, this blocks embedding /etc/shadow or .git. Neither can be included in a module zip.

In general, I'm worried about expanding the scope of the go command too much. However, the fact that there are so many solutions to this problem means there probably ought to be one official solution.

I'm familiar with go_embed_data and go-bindata (of which there are several forks), and this seems to cover those use cases. Are there any important problems the others solve that this doesn't cover?

DeedleFake commented 4 years ago

Blocking certain files shouldn't be too hard, especially if you use a static or embed directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.

I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed top-level declaration actually built into the language. It would work similarly to import, but would only support local paths and would require a name for it to be assigned to. For example,

embed ui "./ui/build"

func main() {
  file, err := ui.Open("version.txt")
  if err != nil {
    panic(err)
  }
  version, err = ioutil.ReadAll(file)
  if err != nil {
    panic(err)
  }
  file.Close()

  log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
  http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}

Edit: You beat me to it, @jayconrod.

josharian commented 4 years ago

To expand on https://github.com/golang/go/issues/35950#issuecomment-561703346, there is a puzzle about the exposed API. The obvious ways to expose the data are []byte, string, and Read-ish interfaces.

The typical case is that you want the embedded data to be immutable. However, all interfaces exposing []byte (which includes io.Reader, io.SectionReader, etc.) must either (1) make a copy, (2) allow mutability, or (3) be immutable despite being a []byte. Exposing the data as strings solves that, but at the cost of an API that will often end up requiring copying anyway, since lots of code that consumes embedded files eventually requires byte slices one way or another.

I'd suggest route (3): be immutable despite being a []byte. You can enforce this cheaply by using a readonly symbol for the backing array. This also lets you safely expose the same data as a []byte and a string; attempts to mutate the data will fail. The compiler can't take advantage of the immutability, but that's not too great of a loss. This is something that toolchain support can bring to the table that (as far as I know) none of the existing codegen packages do.

(A third party codegen package could do this by generating a generic assembly file containing DATA symbols that are marked as readonly, and then short arch-specific assembly files exposing those symbols in the form of strings and []bytes. I wrote CL 163747 specifically with this use case in mind, but never got around to integrating it into any codegen packages.)

DeedleFake commented 4 years ago

I'm unsure what you're talking about in terms of immutability. io.Reader already enforces immutability. That's the entire point. When you call Read(buf), it copies data into the buffer that you provided. Changing buf after that has zero effect on the internals of the io.Reader.

bradfitz commented 4 years ago

I agree with @DeedleFake. I don't want to play games with magic []byte array backings. It's okay to copy from the binary into user-provided buffers.

gdamore commented 4 years ago

Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between \n and \r\n. (We can argue whether this is a dumb thing in DTrace or not -- that's beside the point and it is the situation today.)

It's super useful that backticked strings treat both as \n regardless of how they appear in source, and I rely on this with a go-generate to embed the DTrace.

So if there is an embed file added to the go command, I would gently suggest that options to change the handling of CR/CRLF might come in very handy, particularly for folks who might be developing on different systems where the default line endings can be a gotcha.

rsc commented 4 years ago

Like with compression, I'd really like to stop at "copy the file bytes into the binary". CR/CRLF normalization, Unicode normalization, gofmt'ing, all that belongs elsewhere. Check in the files containing the exact bytes you want. (If your version control can't leave them alone, maybe check in gzipped content and gunzip them at runtime.) There are many file munging knobs we could imagine adding. Let's stop at 0.

rsc commented 4 years ago

It may be too late to introduce a new reserved directory name, as much as I'd like to. (It wasn't too late back in 2014, but it's probably too late now.) So some kind of opt-in comment may be necessary.

Suppose we define a type runtime.Files. Then you could imagine writing:

//go:embed *.html (or static/* etc)
var files runtime.Files

And then at runtime you just call files.Open to get back an interface { io.ReadSeeker; io.ReaderAt } with the data. Note that the var is unexported, so one package can't go around grubbing in another package's embedded files.

Names TBD but as far as the mechanism it seems like that should be enough, and I don't see how to make it simpler. (Simplifications welcome of course!)

rsc commented 4 years ago

Whatever we do, it needs to be possible to support with Bazel and Gazelle too. That would mean having Gazelle recognize the comment and write out a Bazel rule saying the globs, and then we'd need to expose a tool (go tool embedgen or whatever) to generate the extra file to include in the build (the go command would do this automatically and never actually show the extra file). That seems straightforward enough.

gdamore commented 4 years ago

If various munging won't do the trick, then that's an argument against using this new facility. It's not a stopper for me -- I can use go generate like I've been doing, but it means I cannot benefit from the new feature.

With respect to munging in general -- I can imagine a solution where someone provides an implementation of an interface (something like a Reader() on one side, and something to receive the file on the other -- maybe instantianted with an io.Reader from the file itself) -- which the go cmd would build and run to prefilter the file before embedding. Then folks can provide whatever filter they want. I imagine some folks would provide quasi-standard filters like a dos2unix implementation, compression, etc. (Maybe they should be chainable even.)

I guess there'd have to be an assumption that whatever the embedded processor is, it must be compilable on ~every build system, as go would be building a temporary native tool for this purpose.

magical commented 4 years ago

It may be too late to introduce a new reserved directory name, as much as I'd like to. [...] some kind of opt-in comment may be necessary.

If the files are only accessible through a special package, say runtime/embed, then importing that package could be the opt-in signal.

bcmills commented 4 years ago

The io.Read approach seems like it could add significant overhead (in terms of both copying and memory footprint) for conceptually-simple linear operations like strings.Contains (such as in cmd/go/internal/cfg) or, critically, template.Parse.

For those use-cases, it seems ideal to allow the caller to choose whether to treat the whole blob as a (presumably memory-mapped) string or an io.ReaderAt.

That seems compatible with the general runtime.Files approach, though: the thing returned from runtime.Files.Open could have a ReadString() string method that returns the memory-mapped representation.

bcmills commented 4 years ago

some kind of opt-in comment may be necessary.

We could do that with the go version in the go.mod file. Before 1.15 (or whatever) the static subdirectory would contain a package, and at 1.15 or higher it would contain embedded assets.

(That doesn't really help in GOPATH mode, though.)

magical commented 4 years ago

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes.

While i appreciate the drive for simplicity, we should also make sure we're meeting users' needs.

12 out of 14 of the tools listed at https://tech.townsourced.com/post/embedding-static-files-in-go/#comparison support compression, which suggests that it is a pretty common requirement.

It's true that one could do the compression as a pre-build step outside go, but that would still require 1) a tool to do the compression 2) checking some kind of assets.zip blob into vcs 3) probably a utility library around the embed api to undo the compression. At which point it is unclear what the benefit is at all.

Three of the goals listed in the initial proposal were:

If we read the second of these as "don't require a separate tool for embedding" then not supporting compressed files directly or indirectly fails to meet all three of these goals.

jimmyfrasche commented 4 years ago

Does this need to be package level? Module level seems a better granularity since most likely one module = one project.

Since this directory wouldn't contain Go code† could it be something like _static?

† or, if it is, it would be treated as arbitrary bytes whose name happens to end in ".go" instead of as Go code to be compiled

If it's one special directory, the logic could just be slurp up anything and everything in that directory tree. The magic embed package could let you do something like embed.Open("img/logo.svg") to open a file in a subdirectory of the asset tree.

Strings seem good enough. They can easily be copied into []byte or converted into a Reader. Code generation or libraries could be used to provide fancier APIs and handle things during init. That could include decompression or creating an http.FileSystem.

Doesn't Windows have a special format for embedding assets. Should that be used when building a Windows executable? If so, does that have any implications for the kinds of operations that can be provided?

zellyn commented 4 years ago

Don't forget gitfs πŸ˜‚

egonelbre commented 4 years ago

Is there a reason it couldn't be part of go build / link... e.g. go build -embed example=./path/example.txt and some package that exposes access to it (e.g. embed.File("example"), instead of using go:embed?

chewxy commented 4 years ago

you need a stub for that in your code though

mvdan commented 4 years ago

@egonelbre the problem with go build -embed is that all users would need to use it properly. This needs to be fully transparent and automatic; existing go install or go get commands can't stop doing the right thing.

markbates commented 4 years ago

@bradfitz I would recommend https://github.com/markbates/pkger over Packr. It uses the standard library API for working with files.

func run() error {
    f, err := pkger.Open("/public/index.html")
    if err != nil {
        return err
    }
    defer f.Close()

    info, err := f.Stat()
    if err != nil {
        return err
    }

    fmt.Println("Name: ", info.Name())
    fmt.Println("Size: ", info.Size())
    fmt.Println("Mode: ", info.Mode())
    fmt.Println("ModTime: ", info.ModTime())

    if _, err := io.Copy(os.Stdout, f); err != nil {
        return err
    }
    return nil
}
hundt commented 4 years ago

Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

mjibson/esc does this as well, and it is a big quality-of-life improvement when developing a webapp; you not only save linking time but also avoid having to restart the application, which can take substantial time and/or require repeating extra steps to test your changes, depending on the implementation of the webapp.

andreynering commented 4 years ago

Problems with the current situation:

  • Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.

Goals:

  • don't check in generated files

Well, this part is easily solvable by just adding the generated files to the .gitignore file or equivalent. I always did that...

So, alternatively Go could just have its own "official" embed tool that runs by default on go build and ask people to ignore these files as a convention. That would be the less magic solution available (and backward compatible with existing Go versions).

I'm just brainstorming / thinking aloud here... but I actually like the proposed idea in overall. πŸ™‚

andreynering commented 4 years ago

Also, since //go:generate directives don't run automatically on go build the behavior of go build may seem a bit inconsistent: //go:embed will work automatically but for //go:generate you have to run go generate manually. (//go:generate can already break the go get flow if it generates .go files needed for the build).

DeedleFake commented 4 years ago

//go:generate can already break the go get flow if it generates .go files needed for the build

I think the usual flow for that, and the one that I've generally used, although it took a bit of getting used to, is to use go generate entirely as a development-end tool and just commit the files that it generates.

urandom commented 4 years ago

@bradfitz it doesn't need to implement http.FileSystem itself. If the implementation provides a type that implements http.File, then it would be trivial for anyone, including the stdlib http package to provide a wrapper around the Open function, converting the type to http.File in order to conform to http.FileSystem

mvdan commented 4 years ago

@andreynering //go:generate and //go:embed are very different, though. This mechanism can happen seamlessly at build time because it won't run arbitrary code. I believe that makes it similar to how cgo can generate code as part of go build.

sakjur commented 4 years ago

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes.

While i appreciate the drive for simplicity, we should also make sure we're meeting users' needs.

12 out of 14 of the tools listed at https://tech.townsourced.com/post/embedding-static-files-in-go/#comparison support compression, which suggests that it is a pretty common requirement.

I'm not sure I agree with this reasoning.

The compression done by the other libraries is different from adding it to this proposal in that they will not reduce performance on subsequent builds since the alternatives are generally speaking generated ahead of build rather than during build time.

Low build times is a clear added value with Go over other languages and compression trades CPU time for a reduced storage/transfer footprint. If a lot of Go packages starts running compressions on go build we're going to add even more build time than the time added by simply copying assets during builds. I'm skeptical of adding compression because of others doing it. As long as the initial design doesn't by design prevent a future extension which adds support for i.e. compression, putting it in there because it might be something that could benefit some seems like unnecessary hedging.

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB β€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of. Especially not if most of the "heavier" assets are files such as JPEGs or PNGs which are already pretty well compressed.

What about keeping compression out for now and adding it in if it's actually missed by a lot of people? (and can be done without undue costs)

mvdan commented 4 years ago

To add to @sakjur's comment above: compression seems orthogonal to me. I generally want to compress an entire binary or release archive, and not just the assets. Particularly when Go binaries in Go can easily get into the tens of megabytes without any assets.

egonelbre commented 4 years ago

@mvdan I guess one of my concerns is that quite often when I've seen embedding is together with some other pre-processing: minification, typescript compilation, data compression, image crunching, image resizing, sprite-sheets. The only exception being websites that only use html/template. So, in the end, you might end up using some sort of "Makefile" anyways or uploading the pre-processed content. In that sense, I would think a command-line flag would work nicer with other tools than comments.

magical commented 4 years ago

I guess one of my concerns is that quite often when I've seen embedding is together with some other pre-processing: minification, typescript compilation, data compression, image crunching, image resizing, sprite-sheets. The only exception being websites that only use html/template.

Thanks, that's a useful data point. Perhaps the need for compression is not as common as it looked. If that's the case, i agree that it makes sense to leave it out.

magical commented 4 years ago

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB β€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.

Binary size is a big deal for many go developers (https://github.com/golang/go/issues/6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (https://github.com/golang/go/issues/11799, https://github.com/golang/go/issues/26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).

marwan-at-work commented 4 years ago

That doesn't really help in GOPATH mode, though

Maybe, if you're in GOPATH mode, this feature simply doesn't apply since I imagine the Go team doesn't plan on doing feature parity for GOPATH forever? There are already features that are not supported in GOPATH (such as security w/ checksum db, downloading dependencies through a proxy server, and semantic import versioning)

As @bcmills mentioned, having the static directory name in a go.mod file is a great way of introducing this feature in Go 1.15 since the feature can be automatically turned off in go.mod files that have a <=go1.14 clause.

That said, this also means users have to manually write what the static directory path is.

I think the vendor directory and the _test.go conventions are great examples of how they made working with Go and those two features a lot easier.

I don't recall many people requesting the option to customize the vendor directory name or having the ability to change the _test.go convention to something else. But if Go never introduce the _test.go feature, then testing in Go would look a lot different today.

Therefore, maybe a name less generic than static gives better chances of non-collision and so having a conventional directory (similar to vendor and _test.go) could be a better user experience compared to magical comments.

Examples of potentially low-collision names:

Lastly, the vendor directory was added in Go 1.5 . Sooo, maybe it's not that bad to add a new convention now? πŸ˜…

tv42 commented 4 years ago

I think it should expose a mmap-readonly []byte. Just raw access to pages from the executable, paged in by the OS as needed. Everything else can be provided on top of that, with just bytes.NewReader.

If this is for some reason unacceptable, please provide ReaderAt not just ReadSeeker; the latter is trivial to construct from the former, but the other way isn't as good: it would need a mutex to guard the single offset, and ruin performance.

sakjur commented 4 years ago

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB β€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.

Binary size is a big deal for many go developers (#6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (#11799, #26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).

That is definitely a fair point and I can see how my argument can be seen as an argument in favor of carelessness with regards to filesizes. That was not my intention. My point is more in line with shipping this feature without compression which would still be useful for some, and they could provide useful feedback and insights as to how to properly add compression in a way that feels right long-term. The assets might swell in a way that the debug info is unlikely to do and it's easier for developers of packages which are installed/imported by others to reduce build performance needlessly if the implementation makes it easy to do so.

Another option would be to make compression of assets a build-flag and leave the compromise between build size and time to the builder rather than the developer. That would move the decision closer to the end-user of the binary who could make a decision on whether the compression is worthwhile. Otoh, this would risk creating an increased surface area for differences between development and production, so it isn't a clear cut better method than anything else and it's not something I feel like I'd want to advocate for.

tv42 commented 4 years ago

My current asset embedding tool loads content from the asset files when built with -tags dev. Some convention like that would probably be useful here too; it shortens the development cycle significantly when e.g. fiddling with HTML or a template.

If not, the caller will have to wrap this lower-level mechanism with some *_dev.go and *_nodev.go wrappers and implement non-embedded loading for the dev scenario. Not even hard, but that road will just lead to a similar explosion of tools that the first comment on this issue describes. Those tools will have to do less than today, but they'll still multiply.

I think -tags dev failing to work when run outside the Go module would be reasonable (can't figure out where to load the assets from).