Open sdboyer opened 9 years ago
Regarding prune command: it would be great to have an option (--aggressive
?) to remove all *.go files from packages not imported (transitively) in the project's Go sources, not just the unspecified subpackages (in glide.yaml
). All non-.go files except LICENSE(.md), and empty dirs, would also be removed.
What's the use case for --aggressive
? I don't think I've ever seen a package manager that behaved that way.
Well, if you want to store go libraries under version control, you find yourself wanting to minimize the amount of vendored code to the bare minimum. Docker does it with a cryptic shell script. I'd like to use glide :)
On Tue, Nov 24, 2015, 20:34 Matt Butcher notifications@github.com wrote:
What's the use case for --aggressive? I don't think I've ever seen a package manager that behaved that way.
— Reply to this email directly or view it on GitHub https://github.com/Masterminds/glide/issues/143#issuecomment-159305945.
to remove all *.go files from packages not imported (transitively) in the project's Go sources
@imikushin i think what you're looking for is unreachable/dead code elimination, yes? I'd say that's probably out of scope for a package manager (though I can understand why, if you were to need that, it would be convenient to attach it to the package manager).
I'm not sure there's really anything in Go-dom that does that level of analysis outside of the compiler itself. The closest thing I can readily find is go tool vet -unreachable
. But that's really limited static analysis; afaik it is based solely on whether or not an identifier is ever referenced, at all, by anything else in the search scope. What I suspect you're looking for is determining if it's possible to eliminate any Go files based on the specific identifiers that are transitively called from your entry point (the current main.main()
).
That's a harder and more expensive problem. You'd probably have to run some version of a connected components algorithm to figure out which identifiers actually can be safely eliminated (though, thinking through it right now, it seems like a tree/map could be sufficient...), then see if there are any files comprised entirely of unused identifiers. Worth doing in a compiler if you're already traversing all the code, but...
End of the day, though, I'm inclined to think it's not something glide should do because it breaks the basic guarantee that glide leaves the disk state (aka vendor/
) an exact reflection of the information reflected in the lock file. It makes for at least one additional row in the diagram - disk in "normal" or "aggressively pruned" state. That necessarily introduce a possible gotcha+step in local workflows where, when a developer working off of a dependent package's godoc calls a new function that aggressive pruning had eliminated, glide has to rerun in order to recompute the new aggressively pruned disk state. ...but the developer won't know - all they'll see is a compiler error for a missing function, and scratch their head, because it's right there in the docs.
TBH, I haven't experienced a case myself where this kind of thing would really add much value. Could you provide a link to Docker's "cryptic shell script"?
@sdboyer Thanks for your reply. The proposed --aggressive
option is an ideal (very) nice to have feature, but definitely not a hard requirement.
A must have though, is the prune
command that will remove the unspecified packages (per glide.yaml
), i.e. *.go files from the unspecified dirs. This is possible without projects source code analysis and provides one-to-one relationship between the disk state and glide.yaml
content (independent of the projects source code).
The cryptic shell script I mentioned is this one: https://github.com/docker/docker/blob/master/hack/.vendor-helpers.sh And here is it's usage: https://github.com/docker/docker/blob/master/hack/vendor.sh
I failed to adapt it to my project and instead patched glide :)
@imikushin what's the use case? Why prune all the packages that aren't specified at a more detailed level? "As a developer, ...."?
e370601b1e2e44ac90e31bc887eafe3ba01699ca has glide update
writing a lock file (glide.lock
).
@imikushin ...tbh i'm still a bit confused about what you're asking for. You say:
*.go files from the unspecified dirs
I'm not sure that's what you actually want? To literally do what you're saying there, it means not removing non-*.go
files from package directories that are otherwise unused, and even preserving potentially empty directories. This specific phrasing is why I inferred you were looking for dead/unreachable code elimination.
However, from this:
This is possible without projects source code analysis and provides one-to-one relationship between the disk state and glide.yaml content (independent of the projects source code).
And judging from what the docker script does, I think you're actually just asking for a tool that removes whole repositories that are present under vendor/
, but not specified in glide.yaml
. If that's the case, that is very much so the plan. There's an argument to be made that it should be done by all commands, being part of the guarantee of the "blessed state" originally described. If it's not, though, then that's exactly what prune
would do, without any --aggressive
option. In fact, I can't think of anything prune
would do other than this, so I'm not sure what work you were imagining prune
would do WITHOUT --aggressive
.
Oh, I also meant to note in my original response - carving up stuff under vendor/
is also really out of scope for glide because that would make for dirty trees from the VCS' perspective, which would considerably complicate the interactions with them. No bueno.
Sorry for not expressing my use case clearly enough. Here it is, in full. Maybe I'm doing something you guys think is stupid. If that's the case, I want to know what it is. Here's the use case:
As a maintainer of a project written in Go programming language, using quite a few of source code dependencies (one of which is docker and is pretty large), I need to manage these dependencies. The project has a policy (imposed by the project owner) to store all source level dependencies under version control.
I don't need to store these dependencies entirely. Storing just the relevant parts is fine, as long as the tool ensures repeatability: given the same dependency specification, the tool should put the same content into the ./vendor
directory.
Now, minimizing dependencies' footprint is important because most of these files are just taking up space and bandwidth, slowing down the project checkouts and builds.
So, I need a tool to retrieve the specified dependencies and strip the unneeded parts.
I'm currently using my own glide fork to work on RancherOS:
glide.yaml
~/bin/glide up -u --quick --skip-gopath --cache --delete
. This puts all dependencies to ./vendor
dir. To my surprise --delete
only works before retrieving dependencies, so I also need to:~/bin/glide del
, which is my custom command, based on a patched version of glide's delete.goThis is good enough, but glide can do better:
del
(or call it prune
) command implemented in upstream glideup
command (--delete
doesn't work, who'd guess?)prune
remove the packages not explicitly specified in glide.yaml
. (Ideally, remove the non-imported code, but I agree, that is way too much for the tool to stay simple.)By removing packages I mean removing all files from the corresponding directories (except LICENSEs and READMEs) and then remove empty dirs as well.
Why not just remove the dirs with rm -rf
? Because, we might need a subpackage, but not its parent package, e.g. like this:
- package: github.com/docker/libcompose
version: 0919e089edff3ba95d84119228f46d414882ded1
subpackages:
- cli
- docker
- logger
- lookup
- project
- utils
In this particular case I'd like ./vendor/github.com/docker/libcompose
dir to only contain cli
, docker
, logger
, lookup
, project
and utils
sub-directories and maybe the license notice and a readme, but nothing else.
Anyone following along, now is a good time to test the feat/lockfile
branch. The init
, update
, and install
commands are all in working shape. Docs do still need updating.
This should remain open until purge
has been implemented. The other elements are ready to be tested.
tl;dr: this is how we think glide can ensure replicable builds and still be easy to use, for all use cases we've been able to think of.
To help ensure we do pinning properly, @mattfarina @technosophos and I spent a bit of time on videochat, and drew a purty thing. I'm putting it up here, along with my understanding of what we discussed, for reference:
The matrix describes the set of possible disk states glide might encounter, against the set of possible commands a user might run. Our goal was to articulate a basic class of behavior for each of those combinations.
For this diagram to make sense, it's important to understand the overall strategy glide is pursuing: glide ALWAYS tries to create the most reproducible build possible. This entails that, for each
main
package and its correspondingglide.yaml
, all dependencies (direct and transitive) should be pinned to specific, immutable versions. There's no wild-westing, no managing some packages but not others. Glide's sole behavior is to strive for completely deterministic builds; you don't get to turn it off.Other language package managers have historically been lax in this regard. We think that we can make glide's forced determinism easy and transparent enough that even those who "just want to run HEAD" won't be bothered by it, and will still get all the benefits of deterministic builds.
We think this approach is worthwhile because it makes glide's operation less complex, more predictable, and exposes the user to fewer possibly confusing results. It allows us say that, for every command that is run, we target exactly one class of output disk states (in the matrix, it's the state with an orange star):
glide.yaml
file, comprising (at least) all direct dependenciesglide.yaml
fileBecause there is only the one class of target output states, it removes ambiguity about how glide should deal with different types of user actions. This is not only because the end goal is clear, but also because it is impossible that glide would have itself intentionally left the disk in anything other than the correct output state - and therefore, any on-disk state glide finds represents user intent.
These rules amount to removing circular dependencies. M. C. Escher is great, but you don't want him in your package manager.
The disk states in the matrix aren't quite exhaustive, but it's pretty good. They are:
The other dimension is the user action (so, the command run by the user). These are pretty self-explanatory, I think?
We used three marks in the matrix. We weren't very rigorous about their definition, but it goes something like this:
glide get
with a yaml and no lock or vendor will effectively add your new dep to the glide.yaml, then run a fullglide install
in order to write out the entire lockfile and fully populate vendor/.