golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.72k stars 17.62k forks source link

cmd/go: avoid surprising inclusion of "hidden" files when using //go:embed #42328

Closed Merovius closed 3 years ago

Merovius commented 3 years ago

Edit by rsc, Dec 1 2020: Please note that the "likely accepted" answer here is https://github.com/golang/go/issues/42328#issuecomment-725579848, specifically:

This means that //go:embed static/* will embed .DS_Store if you really want to, while //go:embed static will not.


This is forked off from #41191 to talk about a specific issue with the design as-accepted. #42325 and #42321 talk about the same problem, but they are very solution-focused and I think it is more appropriate to discuss the actual problem first.

Namely: @carlmjohnson has pointed out that the //go:embed directive includes .DS_Store files under MacOS. This is certainly working-as-designed, but I think it should be discussed whether that's actually the semantic we want. I want to talk about the .DS_Store example specifically, but there are other dot-files with similar properties. .DS_Store does illustrate something significant though, because it is a directory that is (AIUI, I'm not a Mac-user myself) created non-interactively by a third-party software in every subdirectory. So it is not explicitly created by the user and it is permanent (and will be re-created at some point, if deleted).

There have been several suggestions made so far, which IMO are deficient in one way or another:

There are probably other approaches that can be discussed. It would also be a valid answer to close this as WAI. But if we release go 1.16 with embedding, we get locked into the semantics we implemented, so IMO this should be closed one way or another before releasing it into the wild.


Discussion summary (as of 2020-11-21, 10:00 UTC)

This is my best attempt at a fair and unbiased discussion-summary. It is necessarily subjective, though, so apologies if I left something out or misrepresented someone.

The resulting proposal from the discussion which is currently marked as "likely accept" is to have globs match as it currently does, but to exclude .* and _* files (editors remark: "as the go command" probably implies testdata as well) from recursive directory walks when a directory is given explicitly.

There are a couple of open questions:

There where some alternatives suggested:

There where some counter arguments:

mvdan commented 3 years ago

Thanks for filing this issue. In my comment in the original proposal I did suggest filing a new issue from the point of view of a bug report, so I agree with you that we should begin with the problem and not multiple potential solutions. I hope this doesn't mean I get more thumbs down reactions :)

I fully agree that this needs a decision before 1.16, even if the decision is that we're okay with the current semantics. I also tend to agree that excluding files by accident is less harmful than including them by accident, because the former is easy to spot but the latter could go unnoticed for a long time.

I'm still uneasy about introducing the notion of "hidden files" in the Go toolchain, but they do have a sort of precedent:

$ go help packages
[...]

Directory and file names that begin with "." or "_" are ignored
by the go tool, as are directories named "testdata".

Perhaps we could copy the same rule here. * is about files and not packages, but since builds happen inside package directories, I think it could make sense to be consistent. And the notion of "ignored filenames" by the toolchain would remain easy to remember.

Merovius commented 3 years ago

I think that would be a fine approach (as long as explicit listing or a .*-glob would still allow you to include them). Just to be clear though:

* is about files and not packages

I want to make sure we are in agreement that this isn't just about globs specifically. As I said, I think //go:embed assets also shouldn't embed assets/.DS_Store (for example), even though it doesn't contain a glob.

earthboundkid commented 3 years ago

I think this issue has made a very strongly case for "fail safe" instead of "fail unsafe but document/provide an escape hatch" semantics.

One additional thought: if we get the semantics "wrong" in 1.16, would it be possible to revise them in 1.17 with a go.mod directive? If so, that to me suggests doing something "conservative" in 1.16 and revising to be "liberal" in later versions of Go if the conservative thing was found to be too conservative.

dmitshur commented 3 years ago

CC @rsc.

mvdan commented 3 years ago

@Merovius agreed. The rule would limit both recursing into directories, and the * glob. I think any other glob shouldn't be affected, so that one may use globs such as .* or _*.

earthboundkid commented 3 years ago

Icon\r isn’t dotfile, but it is hidden. Should Go include it?

zephyrtronium commented 3 years ago

In a similar vein, any file can be marked hidden on NTFS. Should //go:embed read filesystem attributes, even on a Linux host compiling code on an NTFS volume, to ignore thumbs.db?

andig commented 3 years ago

For sake of completeness: we need the ability to exclude files on purpose, thunk assets.go. Might also solve excluding dotfiles.

earthboundkid commented 3 years ago

42325 was closed to focus discussion here.

The proposal there was to add a magic comment for negative globs, like //go:embed-exclude *.go. The objection was that it would lead to a lot of OS/IDE specific exclusions being listed and still miss some things that should filtered when encountering a new environment.

My current feeling is that a combination of approaches should be taken:

Merovius commented 3 years ago

@andig I don't agree that that's a necessity for the usecase you mention. There is no need to have assets.go in the same directory, you can have it in a directory further up and use a StripPrefixFS (if it's not in the stdlib, it can always be provided by a third party).

nightlyone commented 3 years ago

A well tested approach, see https://docs.docker.com/engine/reference/builder/#dockerignore-file , is providing something like a .goembedignore file. The .gitignore file is another such example.

Those are well understood semantics, easy to share via dotfile collection, easy to configure in template repositories, easy to add to UIs for repository creation like github.com or gitlab.com as well as repository compliance checkers often found in enterprises can easily verify its existence, syntax and content.

The extensions star-star-match, match-exceptions as well as the ability to add comments mentioned in the .dockerignore documentation also proved critical to practical use.

Adding such information to each mention of go:embed using wildcards seems unpractical to me and too easy to forget.

This problem has been solved pretty well before and I don't think Go needs yet another way to be special here.

Followup questions now are:

earthboundkid commented 3 years ago

Interesting idea. Maybe the default is to exclude dot files and .go files and then tweaking the go.embedignore file for a module could change the parameters to something else. It’s an interesting idea that could wait for Go 1.17 if the window for making it into 1.16 is too narrow now.

nightlyone commented 3 years ago

@rsc / @dmitshur For Go 1.16 I would suggest to revert the ability to add wildcard based trees to go:embed until this issue is solved so we can explore the solution space without releasing a feature with a potential security risk given such precedents in Docker and git before each had a way to ignore certain file patterns recursively.

earthboundkid commented 3 years ago

Even without a wildcard pattern, the problem of including a directory that has an unexpected file will be there.

mvdan commented 3 years ago

Personally, I'm opposed to .goembedignore and go:embed-exclude in general, for a similar reason why we rejected a .goignore in https://github.com/golang/go/issues/30058. Here, you have two options (besides my proposal in https://github.com/golang/go/issues/42328#issuecomment-720169922):

A) To only include a subset of the files, use go:embed on a subdirectory and place them there.

B) To exclulde some sub-directory if option A isn't desirable, you could always drop a go.mod in there to exclude the directory from the current module.

mpx commented 3 years ago

The files referenced by //go:embed are effectively part of the build, just like *.go, *.c, etc.. Ideally it would be similarly easy to understand which embedded files are included. The rule to determine which files are included should be simple and avoid surprising results.

Excluding all files starting with "." seems simplest and least prone to confusion. This avoids including files which typically aren't visible in the directory, and entries which typically related to other system purposes (eg, .DS_Store). build.Context.Import also excludes _, so it might be reasonable to exclude it here as well.

Perhaps this rule could be extended to include dot-files when the pattern/filenames explicitly starts with ".". Another option would be including explicitly referenced files. Eg: //go:embed data data/.mysecrets. However, I suspect few codebases would be motivated to use this option.

seankhliao commented 3 years ago

I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it. Furthermore, most (all?) of the existing tools do not have such behaviour.

From the original issue description, I would argue it is fine for in-development builds to contain extra files such as .DS_Store, building from a clean repo for release is something we should promote and is more or less equivalent to what already happens for go get / go install

andig commented 3 years ago

I really dislike the inclusion of hidden files. However, this is what http.Dir for example says:

Note that Dir could expose sensitive files and directories. Dir will follow symlinks pointing out of the directory tree, which can be especially dangerous if serving from a directory in which users are able to create arbitrary symlinks. Dir will also allow access to files and directories starting with a period, which could expose sensitive directories like .git or sensitive files like .htpasswd. To exclude files with a leading period, remove the files/directories from the server or create a custom FileSystem implementation.

earthboundkid commented 3 years ago

@seankhliao What’s an example of a dotfile you would want to embed? I can’t think of any case where I would ever want it.

earthboundkid commented 3 years ago

.well-known is almost a usecase, but I’m pretty sure all of those need interactive responses, not just a fixed file. And you could just mount well-known at .well-known.

tv42 commented 3 years ago

@andig I've used https://github.com/shurcooL/vfsgen which is one of those file embedders, it works by taking http.FileSystem and generating code that embeds all the data for that http.FileSystem, and it's a pain to avoid embedding hidden files etc with it. Go should aim higher than pain; I don't think http.Dir is a great model to follow there, and new APIs should not have newbie traps in them.

tmthrgd commented 3 years ago

@carlmjohnson There a lots of .well-known responses that are entirely static. security.txt and mta-sts.txt are two that I use in static sites without issue, and there are others as well.

earthboundkid commented 3 years ago

Yes, I stand corrected, not a few of the .well-known responses are static (although not all, including of course the ACME challenge and password redirect).

In any event, it can be handled by doing //go:embed .well-known/* explicitly. Are there are dot files (as opposed to dot directories) that someone would need embedded?

earthboundkid commented 3 years ago

As of 8f2db14, the docs for embed say, "Patterns must not match files outside the package's module, such as ‘.git/*’ or symbolic links," but if you build

    //go:embed example
    var files embed.FS
    http.Handle("/", http.FileServer(http.FS(files)))
    http.ListenAndServe(":3434", nil)

And then init a .git directory inside the example directory, it shows up at localhost:3434/example/.git/. AFAIK, this .git directory can't actually be served as a Go module, so it's confusing that this would seem to work locally and then fail when you try to distribute your module.

Merovius commented 3 years ago

I think it's counterintuitive to exclude certain files, especially when I specify a directory, I expect everything inside it.

I agree that it's counterintuitive. But the question is, if you run into a problem with it, how long will it take you to discover it, figure out the cause and fix the problem? How long will it take you if you didn't realize a dot-file was included, but shouldn't?

Furthermore, most (all?) of the existing tools do not have such behaviour.

The precedent of the go tool ignoring directories starting with ., _ or called testdata was mentioned above.

Are there are dot files (as opposed to dot directories) that someone would need embedded?

As far as I'm concerned, the idea on the table is to allow dot-files as well, as long as they are explicitly named or the pattern starts with a dot. So it doesn't really matter if it's a pattern or a directory - you can always, if you actually want it included, include it manually.

mpx commented 3 years ago

@dmitshur - Should this issue have release-blocker and NeedsDecision labels? If Go 1.16 includes hidden assets it will be uncomfortable to fix in future since any change will build differently. Ideally this issue would be explicitly considered before the beta/release (even if the project decides to include hidden files).

@rsc - perhaps it would be possible to randomly sample repos which use existing asset tooling to see whether dotfiles are commonly included? I suspect it is rare.

Somewhat related, shell globs do not match dot-files either (eg, bash, zsh, tcsh,..). There is precedence for excluding hidden files to avoid surprise and disapointment :slightly_smiling_face:.

Assuming the need to embed dotfiles is rare, most developers/repos will only benefit from this change (less surprise/data leaks). In rare cases where embedding dot-files are desired they could be explicitly specified. In some ways this is better, since there will be a visible indication that hidden files are included and it "fails safe" (no hidden file data leak).

Personally, I'd like to embed a "static" directory, without having to worry about a ".private.swp" Vim file, or other system data being erroneously included. There is a lot of tooling that create dotfiles (cd assistants, sync tooling, temporary files, folder metadata,...). I know I will get caught out sometime - and I'm aware of the issue.

Overall, requiring developers to explicitly include dot-files in rare cases, but preventing developers getting burnt by builds with unknown/unexpected dotfiles seems like a good trade-off? The alternative - relying on developers to "just be more disciplined" seems bound to cause problems.

mvdan commented 3 years ago

I think we should avoid making the thread longer and let the authors of the embed proposal read the thread and chime in. As far as I can tell, all major points/opinions have been made already.

seankhliao commented 3 years ago

One thing I don't see with the proposals to exclude hidden files is how I would include all of them in a directory tree if I wanted to since the ** glob is not available for //go:embed

static/
  |- .dot-one
  `- b/
     `- .dot-two
earthboundkid commented 3 years ago

@seankhliao That's why I'd like examples of the usecases for dotfile embeds. Is there a time when you'd actually need that? What is it? I still can't think of anything besides .well-known, which has a good workaround. If we have concrete usecases in mind, we can judge the severity of the problem. Bear in mind also that you can do multiple globs, e.g. //go:embed .* b/.* c/.* if there are a fixed number of top level directories.

Merovius commented 3 years ago

@seankhliao You'd have to use //go:embed static/.dot-one static/b/.dot-two. There wouldn't be a way to recursively embed all hidden files in a subtree. This is probably going to seem annoying if it happens, but the question is how often it would and whether the advantages outweigh that downside.

@mvdan I agree. I've been trying to keep quiet, waiting for feedback from them :) So let's try to cc @rsc again and add @bradfitz this time :)

rsc commented 3 years ago

We very much want the patterns passed to //go:embed to be evaluated with exactly filepath.Glob, not "Glob with special cases". So * should keep matching .foo, since it does in Glob.

But when you name a directory explicitly, as in //go:embed dir, probably the code that walks over the directory collecting files should ignore .* files and _* files, the same as the go command does for deciding what to build.

What do people think about that middle ground solution?

Merovius commented 3 years ago

I think it's a fine middle ground. I would expect foo/* etc. to be a relatively rare case because foo (on the surface) is more direct and globs to be more specific, like foo/*.html. In which case the concern of accidental inclusion is much lower. And even if people do, I guess it will be sufficiently simple to establish the caveat culturally.

I'm all for it.

ianthehat commented 3 years ago

I think that if we were to add the pattern, many more embeds could be of the more explicit glob include form rather than pulling in an entire directory. Things like `//go:embed mydir//*.html` could become the common pattern and be safer and without any magic that you have to understand.

hherman1 commented 3 years ago

Sorry if this is already answered, but why does it matter if .DS_Store gets embedded?

earthboundkid commented 3 years ago

.DS_Store is mostly just a nuisance rather than a security risk for the most part, but if you were publishing a static set of files but didn't want them indexed (e.g. there are files with an 'unguessable' name which contain valuable information), accessing the .DS_Store would allow an attacker to get the list of all file names. See here (and there's even a reader written in Go).

dcormier commented 3 years ago

We very much want the patterns passed to //go:embed to be evaluated with exactly filepath.Glob, not "Glob with special cases". So * should keep matching .foo, since it does in Glob.

But when you name a directory explicitly, as in //go:embed dir, probably the code that walks over the directory collecting files should ignore . files and _ files, the same as the go command does for deciding what to build.

To be honest, I think the inconsistency of having * include everything, while including a directory would not include everything in it sounds like it would lead to some unexpected, frustrating results.

Merovius commented 3 years ago

@dcormier I think that has been discussed. The question isn't if it might be frustrating, but whether the expected total frustration is higher with one choice or the other. IMO, wanting hidden files is less common and specifying a .* is less frustrating than having to list all non-hidden files explicitly, so the expected total frustration with these semantics are much lower.

@hherman1 IMO even talking about security or whatever misses the point. The downside is that files are included that you don't want included. You could also ask what's the downside of printing random noise to stderr in random intervals. The answer is simply "I don't want it to". It's an accidental observable effect.

rsc commented 3 years ago

@ianthehat FWIW I looked into ** with an eye toward adding it to filepath.Glob, and it is not nearly as well-specified as you might think. The many different systems that support it all seem to give it subtly different meanings.

rsc commented 3 years ago

Based on the discussion, this seems like a likely accept.

ianlancetaylor commented 3 years ago

To clarify I think that what is being accepted is specifically https://github.com/golang/go/issues/42328#issuecomment-725579848, which is not one of the options listed in the original proposal comment.

mpx commented 3 years ago

To clarify, will hidden files be skipped when recursing below directories matched via a glob? If so, this seems like a fairly simple approach which should minimise the surprise.

Patterns like *.jpg are less likely to include unwanted files. However, using static-* to match directories will have all the problems outlined above if it also includes hidden files.

earthboundkid commented 3 years ago

To clarify, will hidden files be skipped when recursing below directories matched via a glob?

I've thought for a while that we haven't really defined what "matched via glob" means. There's the usual problem with having an intuitive sense for plain meaning versus actually telling a computer what to do in excruciating detail.

I think it should mean:

So if you have files dir/.dot/1 and dir/.dot/.sub/2 then do //go:embed dir/*, it will include 1 but not 2.

mvdan commented 3 years ago

@carlmjohnson That is my reading as well. The glob semantics are exactly the same as filepath.Glob, but when walking sub-directories to include all their contents, then the rule to ignore leading _/. and testdata applies. The documentation should be very clear about this, because it's a bit subtle.

seankhliao commented 3 years ago

While I agree with having safe defaults, I still think it's unfortunate that this doesn't have an escape hatch of "everything exactly as on disk"

earthboundkid commented 3 years ago

While I agree with having safe defaults, I still think it's unfortunate that this doesn't have an escape hatch of "everything exactly as on disk"

It's unfortunate, but OTOH, I haven't heard of any cases where someone would need all dotfiles recursively. I am a little more worried about underscore files though TBH.

Merovius commented 3 years ago

FWIW, it's not quite as convenient, but you can always use go generate to generate a file with a //go:embed directive listing all files according to whatever heuristic you want. Maybe that's enough for edge-cases.

I'm genuinely surprised by the concerns listed, TBH. I really can't see myself do anything but //go:embed <dir> with exactly the semantics [edit]likely (sorry)[/edit] accepted or //go:embed <file> with a single file put into a []byte. Not saying that other use-cases don't exist, but it feels a bit YAGNI to, for now.

mvdan commented 3 years ago

I was going to suggest what @Merovius just did - that some basic code generation would pretty easily allow you to embed all files under a set of directories, by explicitly listing them with glob patterns. Here is an example ready to copy-paste:

$ mkdir -p static/d1 static/d2 static/d3/d4
$ cat embed-gen.sh 
pkg=$1
name=$2
shift 2

{
    printf "// Code generated by an embed script. DO NOT EDIT.\n"
    printf "package $pkg\n\n"

    printf "//go:embed"
    for dir in $(find "$@" -type d); do
        printf " $dir/*"
    done
    printf "\nvar $name embed.Files\n"
} >embed.go
$ bash embed-gen.sh foo everything static
$ cat embed.go
// Code generated by an embed script. DO NOT EDIT.
package foo

//go:embed static/* static/d1/* static/d2/* static/d3/* static/d3/d4/*
var everything embed.Files
andig commented 3 years ago

That destroys all the elegance of embed.

zikaeroh commented 3 years ago

Unless I've misunderstood, doesn't https://github.com/golang/go/issues/42328#issuecomment-725579848 imply that if I write:

//go:embed static/*

Then I will get the "ignored" files, recursively, and then only if I write:

//go:embed static

Does static have those special files ignored? If so, then this code gen isn't needed.

I was originally for this proposal, but I realized later that I've been using pkger.Dir which doesn't do any of this, just passes straight through as-is and leaves dotfiles and such in.

mvdan commented 3 years ago

That destroys all the elegance of embed.

Always including all files is elegant, but clearly not what people expect.