golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.12k stars 17.68k forks source link

embed, cmd/go: add support for embedded files #41191

Closed rsc closed 3 years ago

rsc commented 4 years ago

In July, @bradfitz and I posted a draft design for embedded files. The doc links to a video, prototype code, and a Reddit discussion.

The feedback on that design has been overwhelmingly positive.

I propose to adopt the embedded files draft design for Go 1.16, with one addition, suggested in the discussion, to simplify the case of direct access to the bytes in a single embedded file.

As long as a file imports "embed" (import _ "embed" if necessary), it will be permitted to use //go:embed naming a single file (no glob patterns or directory matching allowed) to initialize a plain string or []byte variable:

//go:embed gopher.png
var gopherPNG []byte

The import is required to flag the file as containing //go:embed lines and needing processing. Goimports (and gopls etc) can be taught this rule and automatically add the import in any file with a //go:embed as needed.

The embedded files design depends on the file system interface draft design, which I've also proposed to adopt in #41190.

This issue is only about adopting the embedded files design, under the assumption that the file system interface design is also adopted. If this proposal is accepted before the file system interface design is, we'd simply wait for the file system interface design before starting to land changes.

jimmyfrasche commented 4 years ago

Would it be an error to have a //go:embed directive without importing embed?

rsc commented 4 years ago

@jimmyfrasche Yes, fifth-to-last bullet in list at https://go.googlesource.com/proposal/+/master/design/draft-embed.md#go_embed-directives.

pierrec commented 4 years ago

@rsc Maybe I have missed it in the draft, but I don't see the ability to embed a single file that you mention in your comment. Also, would you be able to embed a single file as a const string as well? Thanks for this great proposal.

rsc commented 4 years ago

@pierrec It's not in the draft doc (the "one addition" is the text in the comment above). Const strings can end up playing a role in deciding whether a program type checks, which would mean all type checkers would need to understand //go:embed'ed consts. In contrast, if we stick to vars, type checkers are none the wiser and can be left alone. Seems like we should probably stick to vars.

Is there a particular reason you wanted a const instead of a var? Using them should be about the same as far as efficiency. (References to const strings end up compiling to what amount to references to hidden vars anyway.)

pierrec commented 4 years ago

Thanks for the explanation. I tend to embed static assets as const strings at the moment this is why I asked. I am fine with vars too!

earthboundkid commented 4 years ago

Interesting, so I could do something like:


//go:embed version.txt
var Version string

And potentially even have a //go:generate comment to generate version.txt. That would cut out a large use case for makefiles/ldflags.

earthboundkid commented 4 years ago

Is it an error if the file isn't found? If so, where technically is the error considered to occur? Link time?

docmerlin commented 4 years ago

Can we make sure that go:embed runs after go:generate so we can do things like easily generate versions, etc?

tooolbox commented 4 years ago

Can we make sure that go:embed runs after go:generate so we can do things like easily generate versions, etc?

From my understanding,go:generate will occur with go generate while go:embed would happen with go build.

rsc commented 4 years ago

@carlmjohnson Yes, it is always an error to say //go:embed foo where foo does not exist. The error happens when compiling the source file containing that line. (If you were to remove foo after compiling that source file, you still wouldn't get to a link step - the go command would notice that the package needs rebuilding because foo was deleted.)

tv42 commented 4 years ago

I think that this proposal is not complete without saying something about ETag. https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fzi0pok/

rsc commented 4 years ago

@tv42, yes, we will make ETag work. I'm not sure what the shape of that is but we will. (Also affirmed on https://github.com/golang/go/issues/35950#issuecomment-685845173.)

andig commented 4 years ago

TwoThree things I've noticed from working with mjibson/esc:

Afterthought: I guess the second point could be remedied in conjunction with the io/fs proposal where I would either use the embedded or the live filesystem for inclusion? Implement the path stripping as io/fs middleware?

Merovius commented 4 years ago

@andig You can already strip prefixes when serving a filesystem over HTTP. I agree that the live-reloading can be done by a third party library wrapping an io/fs.

andig commented 4 years ago

One more thing: if I understand correctly, embed will consider files locally to the package and forbids ... My current design has /assets and /server/ where the latter contains the server‘s code and today hosts the generated files. With this proposal the embed would need to move to the root folder as assets wouldn‘t be accessible from server. This imposes different accessibility constraints from normal imports. I was wondering if this is necessary for security reasons or if module-local embeds should be generally allowed.

thomasf commented 4 years ago

One more thing: if I understand correctly, embed will consider files locally to the package and forbids ... My current design has /assets and /server/ where the latter contains the server‘s code and today hosts the generated files. With this proposal the embed would need to move to the root folder as assets wouldn‘t be accessible from server. This imposes different accessibility constraints from normal imports. I was wondering if this is necessary for security reasons or if module-local embeds should be generally allowed.

You can create a emed.go file in your assets directory and make the assets available as it's own package to the rest of your program.

tristanfisher commented 4 years ago

Another explicit goal is to avoid a language change. To us, embedding static assets seems like a tooling issue, not a language issue.

Agreed. In my opinion, adding syntactical sugar in the language to support this tooling change is a language change. I'm sure this is obvious to others, but this is effectively comment-as-code.

I strongly feel that magic/sugar detracts from the simplicity and readability of the language; it is very easy to miss a magical comment that embeds a file. While a response to this could easily be "okay, then don't use it", this change means that a reviewer still has to be vigilant for others using this feature and has to remember that comments around variable declarations can break builds or fail at compile-time.

I believe this is going to add confusion, detract from language usability, and will result in opaque, large binaries without clear benefit (regarding the lattermost concern, this will even lead to an anti-pattern of re-building binaries due to plain file changes). If go mod allowed for a --withNonGoCodeAssets, I believe this would solve the needs of most developers that don't want to write more complex build pipelines (I assume end-user distribution is a smaller subset of the problem for users).

rsc commented 4 years ago

@tristanfisher, I understand your point about language vs tooling change. It's certainly near the line. The reason that I consider it more a tooling change is that the language spec is unaffected - whether a program is valid does not change, the type checking process does not change. All that changes is the initial value of that variable following the comment. In this way it is a bit like the linker's -X flag, which can set the initial value of a top-level var of type string. It's fine for us to disagree; I just wanted to make my definition clear and explain the distinction I'm making.

As for bloat, I guess we'll have to see, but I don't anticipate programs getting much larger than they already are. People already run tools that turn arbitrary files into Go code, check them into their repos, and make the compiler build them. The design removes some overhead from this process but does not enable anything new. Maybe people will abuse it now that it's easier to do, but on balance I don't expect that to be much of a problem. (And if some dependency embeds something so big that it bloats your binaries, you can always choose not to use that dependency.)

As for rebuilds due to plain file changes, the only files that can trigger rebuilds are the ones in your own top-level module, since dependencies are immutable. If you found rebuilds happening more often than you'd like, the only explanation is (1) you are embedding files and (2) you are modifying those files. You would be in complete control of doing something about either cause. (It would be another thing entirely if a dependency's choice of what to use was somehow forcing extra rebuilds or other expense on you. But that's not the case here.)

tristanfisher commented 4 years ago

@rsc I agree that it's okay for us to disagree and I appreciate your response. My feeling is that if it's included by default in the standard tooling and comments can lead to an implicit initialization of a variable, then it's a language change. Outside of that debate, I guess my icky feeling is around more directives as "magic" comments that need to be memorized by (human) code readers. This could be taken to the absurd conclusion of adding new features via block comments that get handled at build time.

That said, if this gets added to the ecosystem, I will be thankful that importing embed is required -- that's better than nothing as a "hey, head's up" when auditing code. I think go mod allowing for non .go would solve a majority of use cases (I imagine most people are going to glob files for webservers) and would also live entirely in tooling.

I think your point regarding the linker is a good one. It also helps explain my feelings on this: if the very end user (e.g. not someone that simply imports a package) is making the decision, there's no way to be surprised by blobs of non-code coming along for the ride. My concerns are born out of reviewing/pairing-on others' work and "tech-leady" responsibilities, which is why I felt the need to respond.

I think "we'll have to see" sums it up well (I'm more cynical about bloat/misuse).

aykevl commented 4 years ago

I will read through the draft design tonight, so far it looks good from the TinyGo perspective.

I just wanted to clarify one thing:

On the other hand, projects like TinyGo and U-root target systems with more RAM than disk or flash. For those projects, compressing assets and using incremental decompression at runtime could provide significant savings.

I don't know about U-root, but for TinyGo the main targets are microcontrollers which normally have far more flash than RAM (usually a factor of 8 or 16). A quick look at the draft design seems to suggest the idea is to keep the files in read-only memory, which would work fine for these targets: the embedded files can be read directly from flash. It would most likely not be desirable for TinyGo targets to decompress files at runtime.

networkimprov commented 4 years ago

The io/fs proposal on which this depends looks to be blocked on Readdir/FileInfo problems, under discussion in #41188 and previously #40352.

I've drafted an API to replace them in https://github.com/golang/go/issues/41188#issuecomment-686283661

tv42 commented 4 years ago

@andig

One thing I didn't find in the proposal but would need is the ability to live-reload the embedded files during development cycles.

embed.Files implements fs.FS, hence all you need to do is to use dev vs !dev build tag to switch a variable between embed.Files and the real FS.

networkimprov commented 4 years ago

I filed #41265. It offers a new ReadDir() API for io/fs.

jonbodner commented 4 years ago

I have similar concerns as @tristanfisher . Go has been using magic comments as compiler directives for a long time (since the beginning?), but they are meant for corner cases and it is rare for them to appear in code. Given the popularity of embedding static content in Go binaries, //go:embed is likely to be more common. Maybe it's time to consider a different syntax for compiler directives?

mvdan commented 4 years ago

Just a reminder that changing the Go syntax has a very high cost. Pretty much every Go tool out there would need to be updated and/or fixed to support the new syntax, for example.

iand commented 4 years ago

I don't consider them magic comments. Lines starting with //go: are directives and could be defined as such in the spec. There's not a lot of semantic difference between //go:embed, @embed, [[embed]] or any other number of syntax variations, except the //go: prefix is already treated as non-code by Go tools. (my editor highlights those lines differently for example)

jonbodner commented 4 years ago

@mvdan If this proposal happens, Go syntax has changed. It's just changed in a way that doesn't break existing tooling. Maybe that seems pedantic.

@iand I'm not fussy about the specific syntax for compiler directives. I just think that it needs to be formalized at some point and the rules specified.

I think this proposal is a good idea. It solves a common problem. My concern is that the cost of adopting it should be made a little more explicit.

ianlancetaylor commented 4 years ago

@jonbodner I share your concerns about magic comments. But to some extent the rules are specified by #37974.

rsc commented 4 years ago

@networkimprov, this is not the io/fs proposal. Please stop commenting about ReadDir here.

rsc commented 4 years ago

@jonbodner

I'm not fussy about the specific syntax for compiler directives. I just think that it needs to be formalized at some point and the rules specified.

I would just point out that we made the decision to use //go: to mark Go toolchain directives when we added (the limited use) //go:nointerface annotation back in 2012. We added //go:noescape for assembly authors in 2013. We added //go:generate in 2014. We are likely adding //go:build in 2020-2021 as well. There are others; that's just the highlights. You can think of //go: as meaning #pragma from C if it helps.

At this point, the convention is very well established. We chose that syntax back in 2012 because (1) it's obviously not a comment for a person; (2) tools that don't know about the comments will ignore them because they're comments; and (3) it generalizes to other tools (s/go/yourtool/).

And as Ian said, #37974 formalized the exact generalized comment syntax, for what that's worth.

rsc commented 4 years ago

Based on the discussion above, this seems like a likely accept. (Again, assuming but separate from the FS proposal.)

rsc commented 4 years ago

No change in consensus, so accepted.

andig commented 4 years ago

I'm eager to get my hands on embed- can this already be tested on master or are there any plans to ship it as experiment during the 1.15 cycle?

rsc commented 4 years ago

@andig, Go 1.15 is out already. I still hope this will be in Go 1.16 and landing in the development branch this month.

septs commented 4 years ago

@rsc 1.16 available?

rsc commented 4 years ago

@septs, no we are still working on Go 1.16. The code freeze is Oct 31, with a target release date of Feb 1.

septs commented 4 years ago

fastest 2021Q1 or 2021Q2 release?

mvdan commented 4 years ago

@septs please stop asking questions about Go releases in this thread. Over twenty people follow it and get notified. See https://golang.org/wiki/Questions and https://github.com/golang/go/wiki/Go-Release-Cycle.

gopherbot commented 4 years ago

Change https://golang.org/cl/243941 mentions this issue: go/build: recognize and report //go:embed lines

gopherbot commented 4 years ago

Change https://golang.org/cl/243940 mentions this issue: go/build: refactor per-file info & reader

gopherbot commented 4 years ago

Change https://golang.org/cl/243942 mentions this issue: embed: implement Files

gopherbot commented 4 years ago

Change https://golang.org/cl/243944 mentions this issue: cmd/compile: add //go:embed support

gopherbot commented 4 years ago

Change https://golang.org/cl/243945 mentions this issue: cmd/go: add //go:embed support

rsc commented 4 years ago

One detail that came up in the implementation review is that "Files" as a singular noun is pretty awkward ("A Files holds ...").

The choice of embed.Files for the name predated both the io/fs proposal and also the support for string and []byte. Given both of those developments, one seemingly sensible way to resolve the "A Files holds" problem is to call it an FS instead of a Files.

Then the three ways to embed & print data are:

import "embed"

//go:embed hello.txt
var s string
print(s)

//go:embed hello.txt
var b []byte
print(string(b))

//go:embed hello.txt
var f embed.FS
data, _ := f.ReadFile("hello.txt")
print(string(data))

That seems clearer about what you get: a string, a []byte, or an FS. That is, most of the functionality of the embed.F* comes from it being an fs.FS, and calling it FS makes that clearer than calling it Files.

I've made that change in my latest draft of the CL implementing package embed, but I wanted to circle back here and see if there are any objections to the name change.

(A more radical change would be to do var f fs.FS instead of var f embed.FS, but that would preclude ever having any method on f other than Open. For example, above, having ReadFile is convenient and would not be possible. And in general we've learned that using a concrete type for something that might want to add methods later is good future-proofing compared to using an interface type directly.)

Merovius commented 4 years ago

I think the rename is a good change.

In regards to the more radical change:

Overall, I still agree using the concrete embed.FS is better - if nothing else, then for documentation purposes.

chabad360 commented 4 years ago

Now that you've mentioned it, I don't think I got this clear: we can embed directories right?

earthboundkid commented 4 years ago

Yes as an embed.FS which implements fs.FS.

rsc commented 4 years ago

@Merovius, embed.FS uses value receivers. embed.FS is a one-word struct containing a single pointer, so there's no actual overhead to doing so, but it means you can assign them around and use them without worrying about *s and \&s everywhere.

@chabad360, yes, you can embed directories.

burik666 commented 4 years ago

What about symlinks?

rsc commented 4 years ago

@burik666, see https://golang.org/s/draft-embed-design for details, but no, you cannot embed a symlink.