Closed Merovius closed 3 years ago
Thanks for filing this issue. In my comment in the original proposal I did suggest filing a new issue from the point of view of a bug report, so I agree with you that we should begin with the problem and not multiple potential solutions. I hope this doesn't mean I get more thumbs down reactions :)
I fully agree that this needs a decision before 1.16, even if the decision is that we're okay with the current semantics. I also tend to agree that excluding files by accident is less harmful than including them by accident, because the former is easy to spot but the latter could go unnoticed for a long time.
I'm still uneasy about introducing the notion of "hidden files" in the Go toolchain, but they do have a sort of precedent:
$ go help packages
[...]
Directory and file names that begin with "." or "_" are ignored
by the go tool, as are directories named "testdata".
Perhaps we could copy the same rule here. *
is about files and not packages, but since builds happen inside package directories, I think it could make sense to be consistent. And the notion of "ignored filenames" by the toolchain would remain easy to remember.
I think that would be a fine approach (as long as explicit listing or a .*
-glob would still allow you to include them). Just to be clear though:
*
is about files and not packages
I want to make sure we are in agreement that this isn't just about globs specifically. As I said, I think //go:embed assets
also shouldn't embed assets/.DS_Store
(for example), even though it doesn't contain a glob.
I think this issue has made a very strongly case for "fail safe" instead of "fail unsafe but document/provide an escape hatch" semantics.
One additional thought: if we get the semantics "wrong" in 1.16, would it be possible to revise them in 1.17 with a go.mod directive? If so, that to me suggests doing something "conservative" in 1.16 and revising to be "liberal" in later versions of Go if the conservative thing was found to be too conservative.
CC @rsc.
@Merovius agreed. The rule would limit both recursing into directories, and the *
glob. I think any other glob shouldn't be affected, so that one may use globs such as .*
or _*
.
Icon\r
isn’t dotfile, but it is hidden. Should Go include it?
In a similar vein, any file can be marked hidden on NTFS. Should //go:embed read filesystem attributes, even on a Linux host compiling code on an NTFS volume, to ignore thumbs.db
?
For sake of completeness: we need the ability to exclude files on purpose, thunk assets.go. Might also solve excluding dotfiles.
The proposal there was to add a magic comment for negative globs, like //go:embed-exclude *.go
. The objection was that it would lead to a lot of OS/IDE specific exclusions being listed and still miss some things that should filtered when encountering a new environment.
My current feeling is that a combination of approaches should be taken:
@andig I don't agree that that's a necessity for the usecase you mention. There is no need to have assets.go
in the same directory, you can have it in a directory further up and use a StripPrefixFS (if it's not in the stdlib, it can always be provided by a third party).
A well tested approach, see https://docs.docker.com/engine/reference/builder/#dockerignore-file , is providing something like a .goembedignore file. The .gitignore file is another such example.
Those are well understood semantics, easy to share via dotfile collection, easy to configure in template repositories, easy to add to UIs for repository creation like github.com or gitlab.com as well as repository compliance checkers often found in enterprises can easily verify its existence, syntax and content.
The extensions star-star-match, match-exceptions as well as the ability to add comments mentioned in the .dockerignore documentation also proved critical to practical use.
Adding such information to each mention of go:embed using wildcards seems unpractical to me and too easy to forget.
This problem has been solved pretty well before and I don't think Go needs yet another way to be special here.
Followup questions now are:
Interesting idea. Maybe the default is to exclude dot files and .go files and then tweaking the go.embedignore file for a module could change the parameters to something else. It’s an interesting idea that could wait for Go 1.17 if the window for making it into 1.16 is too narrow now.
@rsc / @dmitshur For Go 1.16 I would suggest to revert the ability to add wildcard based trees to go:embed until this issue is solved so we can explore the solution space without releasing a feature with a potential security risk given such precedents in Docker and git before each had a way to ignore certain file patterns recursively.
Even without a wildcard pattern, the problem of including a directory that has an unexpected file will be there.
Personally, I'm opposed to .goembedignore
and go:embed-exclude
in general, for a similar reason why we rejected a .goignore
in https://github.com/golang/go/issues/30058. Here, you have two options (besides my proposal in https://github.com/golang/go/issues/42328#issuecomment-720169922):
A) To only include a subset of the files, use go:embed
on a subdirectory and place them there.
B) To exclulde some sub-directory if option A isn't desirable, you could always drop a go.mod
in there to exclude the directory from the current module.
The files referenced by //go:embed
are effectively part of the build, just like *.go
, *.c
, etc.. Ideally it would be similarly easy to understand which embedded files are included. The rule to determine which files are included should be simple and avoid surprising results.
Excluding all files starting with "." seems simplest and least prone to confusion. This avoids including files which typically aren't visible in the directory, and entries which typically related to other system purposes (eg, .DS_Store
). build.Context.Import
also excludes _
, so it might be reasonable to exclude it here as well.
Perhaps this rule could be extended to include dot-files when the pattern/filenames explicitly starts with ".". Another option would be including explicitly referenced files. Eg: //go:embed data data/.mysecrets
. However, I suspect few codebases would be motivated to use this option.
I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it. Furthermore, most (all?) of the existing tools do not have such behaviour.
From the original issue description, I would argue it is fine for in-development builds to contain extra files such as .DS_Store
, building from a clean repo for release is something we should promote and is more or less equivalent to what already happens for go get
/ go install
I really dislike the inclusion of hidden files. However, this is what http.Dir
for example says:
Note that Dir could expose sensitive files and directories. Dir will follow symlinks pointing out of the directory tree, which can be especially dangerous if serving from a directory in which users are able to create arbitrary symlinks. Dir will also allow access to files and directories starting with a period, which could expose sensitive directories like .git or sensitive files like .htpasswd. To exclude files with a leading period, remove the files/directories from the server or create a custom FileSystem implementation.
@seankhliao What’s an example of a dotfile you would want to embed? I can’t think of any case where I would ever want it.
.well-known
is almost a usecase, but I’m pretty sure all of those need interactive responses, not just a fixed file. And you could just mount well-known
at .well-known
.
@andig I've used https://github.com/shurcooL/vfsgen which is one of those file embedders, it works by taking http.FileSystem and generating code that embeds all the data for that http.FileSystem, and it's a pain to avoid embedding hidden files etc with it. Go should aim higher than pain; I don't think http.Dir is a great model to follow there, and new APIs should not have newbie traps in them.
@carlmjohnson There a lots of .well-known responses that are entirely static. security.txt and mta-sts.txt are two that I use in static sites without issue, and there are others as well.
Yes, I stand corrected, not a few of the .well-known
responses are static (although not all, including of course the ACME challenge and password redirect).
In any event, it can be handled by doing //go:embed .well-known/*
explicitly. Are there are dot files (as opposed to dot directories) that someone would need embedded?
As of 8f2db14, the docs for embed say, "Patterns must not match files outside the package's module, such as ‘.git/*’ or symbolic links," but if you build
//go:embed example
var files embed.FS
http.Handle("/", http.FileServer(http.FS(files)))
http.ListenAndServe(":3434", nil)
And then init a .git
directory inside the example
directory, it shows up at localhost:3434/example/.git/. AFAIK, this .git directory can't actually be served as a Go module, so it's confusing that this would seem to work locally and then fail when you try to distribute your module.
I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it.
I agree that it's counterintuitive. But the question is, if you run into a problem with it, how long will it take you to discover it, figure out the cause and fix the problem? How long will it take you if you didn't realize a dot-file was included, but shouldn't?
Furthermore, most (all?) of the existing tools do not have such behaviour.
The precedent of the go tool ignoring directories starting with .
, _
or called testdata
was mentioned above.
Are there are dot files (as opposed to dot directories) that someone would need embedded?
As far as I'm concerned, the idea on the table is to allow dot-files as well, as long as they are explicitly named or the pattern starts with a dot. So it doesn't really matter if it's a pattern or a directory - you can always, if you actually want it included, include it manually.
@dmitshur - Should this issue have release-blocker
and NeedsDecision
labels? If Go 1.16 includes hidden assets it will be uncomfortable to fix in future since any change will build differently. Ideally this issue would be explicitly considered before the beta/release (even if the project decides to include hidden files).
@rsc - perhaps it would be possible to randomly sample repos which use existing asset tooling to see whether dotfiles are commonly included? I suspect it is rare.
Somewhat related, shell globs do not match dot-files either (eg, bash, zsh, tcsh,..). There is precedence for excluding hidden files to avoid surprise and disapointment :slightly_smiling_face:.
Assuming the need to embed dotfiles is rare, most developers/repos will only benefit from this change (less surprise/data leaks). In rare cases where embedding dot-files are desired they could be explicitly specified. In some ways this is better, since there will be a visible indication that hidden files are included and it "fails safe" (no hidden file data leak).
Personally, I'd like to embed a "static" directory, without having to worry about a ".private.swp" Vim file, or other system data being erroneously included. There is a lot of tooling that create dotfiles (cd assistants, sync tooling, temporary files, folder metadata,...). I know I will get caught out sometime - and I'm aware of the issue.
Overall, requiring developers to explicitly include dot-files in rare cases, but preventing developers getting burnt by builds with unknown/unexpected dotfiles seems like a good trade-off? The alternative - relying on developers to "just be more disciplined" seems bound to cause problems.
I think we should avoid making the thread longer and let the authors of the embed proposal read the thread and chime in. As far as I can tell, all major points/opinions have been made already.
One thing I don't see with the proposals to exclude hidden files is how I would include all of them in a directory tree if I wanted to since the **
glob is not available for //go:embed
static/
|- .dot-one
`- b/
`- .dot-two
@seankhliao That's why I'd like examples of the usecases for dotfile embeds. Is there a time when you'd actually need that? What is it? I still can't think of anything besides .well-known
, which has a good workaround. If we have concrete usecases in mind, we can judge the severity of the problem. Bear in mind also that you can do multiple globs, e.g. //go:embed .* b/.* c/.*
if there are a fixed number of top level directories.
@seankhliao You'd have to use //go:embed static/.dot-one static/b/.dot-two
. There wouldn't be a way to recursively embed all hidden files in a subtree. This is probably going to seem annoying if it happens, but the question is how often it would and whether the advantages outweigh that downside.
@mvdan I agree. I've been trying to keep quiet, waiting for feedback from them :) So let's try to cc @rsc again and add @bradfitz this time :)
We very much want the patterns passed to //go:embed to be evaluated with exactly filepath.Glob, not "Glob with special cases". So *
should keep matching .foo
, since it does in Glob.
But when you name a directory explicitly, as in //go:embed dir
, probably the code that walks over the directory collecting files should ignore .*
files and _*
files, the same as the go command does for deciding what to build.
What do people think about that middle ground solution?
I think it's a fine middle ground. I would expect foo/*
etc. to be a relatively rare case because foo
(on the surface) is more direct and globs to be more specific, like foo/*.html
. In which case the concern of accidental inclusion is much lower. And even if people do, I guess it will be sufficiently simple to establish the caveat culturally.
I'm all for it.
I think that if we were to add the pattern, many more embeds could be of the more explicit glob include form rather than pulling in an entire directory. Things like `//go:embed mydir//*.html` could become the common pattern and be safer and without any magic that you have to understand.
Sorry if this is already answered, but why does it matter if .DS_Store gets embedded?
.DS_Store is mostly just a nuisance rather than a security risk for the most part, but if you were publishing a static set of files but didn't want them indexed (e.g. there are files with an 'unguessable' name which contain valuable information), accessing the .DS_Store would allow an attacker to get the list of all file names. See here (and there's even a reader written in Go).
We very much want the patterns passed to //go:embed to be evaluated with exactly filepath.Glob, not "Glob with special cases". So * should keep matching .foo, since it does in Glob.
But when you name a directory explicitly, as in //go:embed dir, probably the code that walks over the directory collecting files should ignore . files and _ files, the same as the go command does for deciding what to build.
To be honest, I think the inconsistency of having *
include everything, while including a directory would not include everything in it sounds like it would lead to some unexpected, frustrating results.
@dcormier I think that has been discussed. The question isn't if it might be frustrating, but whether the expected total frustration is higher with one choice or the other. IMO, wanting hidden files is less common and specifying a .*
is less frustrating than having to list all non-hidden files explicitly, so the expected total frustration with these semantics are much lower.
@hherman1 IMO even talking about security or whatever misses the point. The downside is that files are included that you don't want included. You could also ask what's the downside of printing random noise to stderr in random intervals. The answer is simply "I don't want it to". It's an accidental observable effect.
@ianthehat FWIW I looked into **
with an eye toward adding it to filepath.Glob, and it is not nearly as well-specified as you might think. The many different systems that support it all seem to give it subtly different meanings.
Based on the discussion, this seems like a likely accept.
To clarify I think that what is being accepted is specifically https://github.com/golang/go/issues/42328#issuecomment-725579848, which is not one of the options listed in the original proposal comment.
To clarify, will hidden files be skipped when recursing below directories matched via a glob? If so, this seems like a fairly simple approach which should minimise the surprise.
Patterns like *.jpg
are less likely to include unwanted files. However, using static-*
to match directories will have all the problems outlined above if it also includes hidden files.
To clarify, will hidden files be skipped when recursing below directories matched via a glob?
I've thought for a while that we haven't really defined what "matched via glob" means. There's the usual problem with having an intuitive sense for plain meaning versus actually telling a computer what to do in excruciating detail.
I think it should mean:
So if you have files dir/.dot/1
and dir/.dot/.sub/2
then do //go:embed dir/*
, it will include 1 but not 2.
@carlmjohnson That is my reading as well. The glob semantics are exactly the same as filepath.Glob
, but when walking sub-directories to include all their contents, then the rule to ignore leading _
/.
and testdata
applies. The documentation should be very clear about this, because it's a bit subtle.
While I agree with having safe defaults, I still think it's unfortunate that this doesn't have an escape hatch of "everything exactly as on disk"
While I agree with having safe defaults, I still think it's unfortunate that this doesn't have an escape hatch of "everything exactly as on disk"
It's unfortunate, but OTOH, I haven't heard of any cases where someone would need all dotfiles recursively. I am a little more worried about underscore files though TBH.
FWIW, it's not quite as convenient, but you can always use go generate
to generate a file with a //go:embed
directive listing all files according to whatever heuristic you want. Maybe that's enough for edge-cases.
I'm genuinely surprised by the concerns listed, TBH. I really can't see myself do anything but //go:embed <dir>
with exactly the semantics [edit]likely (sorry)[/edit] accepted or //go:embed <file>
with a single file put into a []byte
. Not saying that other use-cases don't exist, but it feels a bit YAGNI to, for now.
I was going to suggest what @Merovius just did - that some basic code generation would pretty easily allow you to embed all files under a set of directories, by explicitly listing them with glob patterns. Here is an example ready to copy-paste:
$ mkdir -p static/d1 static/d2 static/d3/d4
$ cat embed-gen.sh
pkg=$1
name=$2
shift 2
{
printf "// Code generated by an embed script. DO NOT EDIT.\n"
printf "package $pkg\n\n"
printf "//go:embed"
for dir in $(find "$@" -type d); do
printf " $dir/*"
done
printf "\nvar $name embed.Files\n"
} >embed.go
$ bash embed-gen.sh foo everything static
$ cat embed.go
// Code generated by an embed script. DO NOT EDIT.
package foo
//go:embed static/* static/d1/* static/d2/* static/d3/* static/d3/d4/*
var everything embed.Files
That destroys all the elegance of embed.
Unless I've misunderstood, doesn't https://github.com/golang/go/issues/42328#issuecomment-725579848 imply that if I write:
//go:embed static/*
Then I will get the "ignored" files, recursively, and then only if I write:
//go:embed static
Does static
have those special files ignored? If so, then this code gen isn't needed.
I was originally for this proposal, but I realized later that I've been using pkger.Dir
which doesn't do any of this, just passes straight through as-is and leaves dotfiles and such in.
That destroys all the elegance of embed.
Always including all files is elegant, but clearly not what people expect.
Edit by rsc, Dec 1 2020: Please note that the "likely accepted" answer here is https://github.com/golang/go/issues/42328#issuecomment-725579848, specifically:
patterns passed to //go:embed are evaluated exactly with filepath.Glob, not "Glob with special cases". So matching * keeps matching .foo, because it does in Glob.
when you name a directory explicitly, as in //go:embed dir, the code that walks that directory collecting all the files to embed will ignore names beginning with dot or underscore, the same as the go command does for deciding what to build.
This means that
//go:embed static/*
will embed .DS_Store if you really want to, while//go:embed static
will not.This is forked off from #41191 to talk about a specific issue with the design as-accepted. #42325 and #42321 talk about the same problem, but they are very solution-focused and I think it is more appropriate to discuss the actual problem first.
Namely: @carlmjohnson has pointed out that the
//go:embed
directive includes.DS_Store
files under MacOS. This is certainly working-as-designed, but I think it should be discussed whether that's actually the semantic we want. I want to talk about the.DS_Store
example specifically, but there are other dot-files with similar properties..DS_Store
does illustrate something significant though, because it is a directory that is (AIUI, I'm not a Mac-user myself) created non-interactively by a third-party software in every subdirectory. So it is not explicitly created by the user and it is permanent (and will be re-created at some point, if deleted).There have been several suggestions made so far, which IMO are deficient in one way or another:
go build
or live with inclusion of any such files. IMO this puts unreasonable expectations on users. No matter how much we feel that in an ideal world, this should be what happens - I just don't think it is what will happen, in practice.git clean
beforego build
. This is certainly useful in CI/CD, but during regular development, it would also delete progress made. It's certainly not something I'd want to run before everygo test
orgo build
. In CI/CD it is useful, but it will probably still be forgotten by many people - though it's also far less of a problem, because the chance of pollution is low..DS_Store
is something that ~always should be excluded and as others have pointed out, there are many other patterns of files that should be excluded and would have to be listed. In effect, this would mean that ~every//go:embed
directive would have to list a non-canonical, long list of exclusions, to cover any tools used by people. It also still suffers from the problem that people have to know about the problem in the first place. So IMO it would still end up with accidental inclusions of.DS_Store
and similar files.*
not match dot-files. This isn't really helpful either, as even then, if a directory is named (either directly as//go:embed assets
or indirectly via a glob), we would still recursively include all subdirectories, including.DS_Store
.There are probably other approaches that can be discussed. It would also be a valid answer to close this as WAI. But if we release go 1.16 with embedding, we get locked into the semantics we implemented, so IMO this should be closed one way or another before releasing it into the wild.
Discussion summary (as of 2020-11-21, 10:00 UTC)
This is my best attempt at a fair and unbiased discussion-summary. It is necessarily subjective, though, so apologies if I left something out or misrepresented someone.
The resulting proposal from the discussion which is currently marked as "likely accept" is to have globs match as it currently does, but to exclude
.*
and_*
files (editors remark: "as the go command" probably impliestestdata
as well) from recursive directory walks when a directory is given explicitly.There are a couple of open questions:
_*
andtestdata
really be skipped (comment by @mpx)? The case for them comes down to consistency withgo build
and seems significantly weaker than for.*
.There where some alternatives suggested:
**
matching operator tofilepath.Glob
- don't leave out hidden files when walking directories (comment by @ianthehat). @rsc remarks that this might be unexpectedly unspecified. Also, the presence of**
would also make it easier to specifically match all hidden files, if they are desired (comment by @Merovius) so can be argued in favor of this proposal as well.There where some counter arguments:
*
but not in directory walks (comment by @dcormier). The proponents response is that we agree, but it might still be better than the current alternative. We need to make a tradeoff (comment by @Merovius). @SamWhited posted examples of where they embedded dot-files, which might be useful to inform that tredoff.