Masterminds / glide

Package Management for Golang
https://glide.sh
Other
8.15k stars 540 forks source link

Planning for pinning #143

Open sdboyer opened 9 years ago

sdboyer commented 9 years ago

tl;dr: this is how we think glide can ensure replicable builds and still be easy to use, for all use cases we've been able to think of.

To help ensure we do pinning properly, @mattfarina @technosophos and I spent a bit of time on videochat, and drew a purty thing. I'm putting it up here, along with my understanding of what we discussed, for reference:

img_20151118_122939

The matrix describes the set of possible disk states glide might encounter, against the set of possible commands a user might run. Our goal was to articulate a basic class of behavior for each of those combinations.

For this diagram to make sense, it's important to understand the overall strategy glide is pursuing: glide ALWAYS tries to create the most reproducible build possible. This entails that, for each main package and its corresponding glide.yaml, all dependencies (direct and transitive) should be pinned to specific, immutable versions. There's no wild-westing, no managing some packages but not others. Glide's sole behavior is to strive for completely deterministic builds; you don't get to turn it off.

Other language package managers have historically been lax in this regard. We think that we can make glide's forced determinism easy and transparent enough that even those who "just want to run HEAD" won't be bothered by it, and will still get all the benefits of deterministic builds.

We think this approach is worthwhile because it makes glide's operation less complex, more predictable, and exposes the user to fewer possibly confusing results. It allows us say that, for every command that is run, we target exactly one class of output disk states (in the matrix, it's the state with an orange star):

Because there is only the one class of target output states, it removes ambiguity about how glide should deal with different types of user actions. This is not only because the end goal is clear, but also because it is impossible that glide would have itself intentionally left the disk in anything other than the correct output state - and therefore, any on-disk state glide finds represents user intent.

These rules amount to removing circular dependencies. M. C. Escher is great, but you don't want him in your package manager.

The disk states in the matrix aren't quite exhaustive, but it's pretty good. They are:

The other dimension is the user action (so, the command run by the user). These are pretty self-explanatory, I think?

We used three marks in the matrix. We weren't very rigorous about their definition, but it goes something like this:

imikushin commented 9 years ago

Regarding prune command: it would be great to have an option (--aggressive?) to remove all *.go files from packages not imported (transitively) in the project's Go sources, not just the unspecified subpackages (in glide.yaml). All non-.go files except LICENSE(.md), and empty dirs, would also be removed.

technosophos commented 9 years ago

What's the use case for --aggressive? I don't think I've ever seen a package manager that behaved that way.

imikushin commented 9 years ago

Well, if you want to store go libraries under version control, you find yourself wanting to minimize the amount of vendored code to the bare minimum. Docker does it with a cryptic shell script. I'd like to use glide :)

On Tue, Nov 24, 2015, 20:34 Matt Butcher notifications@github.com wrote:

What's the use case for --aggressive? I don't think I've ever seen a package manager that behaved that way.

— Reply to this email directly or view it on GitHub https://github.com/Masterminds/glide/issues/143#issuecomment-159305945.

sdboyer commented 9 years ago

to remove all *.go files from packages not imported (transitively) in the project's Go sources

@imikushin i think what you're looking for is unreachable/dead code elimination, yes? I'd say that's probably out of scope for a package manager (though I can understand why, if you were to need that, it would be convenient to attach it to the package manager).

I'm not sure there's really anything in Go-dom that does that level of analysis outside of the compiler itself. The closest thing I can readily find is go tool vet -unreachable. But that's really limited static analysis; afaik it is based solely on whether or not an identifier is ever referenced, at all, by anything else in the search scope. What I suspect you're looking for is determining if it's possible to eliminate any Go files based on the specific identifiers that are transitively called from your entry point (the current main.main()).

That's a harder and more expensive problem. You'd probably have to run some version of a connected components algorithm to figure out which identifiers actually can be safely eliminated (though, thinking through it right now, it seems like a tree/map could be sufficient...), then see if there are any files comprised entirely of unused identifiers. Worth doing in a compiler if you're already traversing all the code, but...

End of the day, though, I'm inclined to think it's not something glide should do because it breaks the basic guarantee that glide leaves the disk state (aka vendor/) an exact reflection of the information reflected in the lock file. It makes for at least one additional row in the diagram - disk in "normal" or "aggressively pruned" state. That necessarily introduce a possible gotcha+step in local workflows where, when a developer working off of a dependent package's godoc calls a new function that aggressive pruning had eliminated, glide has to rerun in order to recompute the new aggressively pruned disk state. ...but the developer won't know - all they'll see is a compiler error for a missing function, and scratch their head, because it's right there in the docs.

TBH, I haven't experienced a case myself where this kind of thing would really add much value. Could you provide a link to Docker's "cryptic shell script"?

imikushin commented 9 years ago

@sdboyer Thanks for your reply. The proposed --aggressive option is an ideal (very) nice to have feature, but definitely not a hard requirement.

A must have though, is the prune command that will remove the unspecified packages (per glide.yaml), i.e. *.go files from the unspecified dirs. This is possible without projects source code analysis and provides one-to-one relationship between the disk state and glide.yaml content (independent of the projects source code).

The cryptic shell script I mentioned is this one: https://github.com/docker/docker/blob/master/hack/.vendor-helpers.sh And here is it's usage: https://github.com/docker/docker/blob/master/hack/vendor.sh

I failed to adapt it to my project and instead patched glide :)

mattfarina commented 9 years ago

@imikushin what's the use case? Why prune all the packages that aren't specified at a more detailed level? "As a developer, ...."?

mattfarina commented 9 years ago

e370601b1e2e44ac90e31bc887eafe3ba01699ca has glide update writing a lock file (glide.lock).

sdboyer commented 9 years ago

@imikushin ...tbh i'm still a bit confused about what you're asking for. You say:

*.go files from the unspecified dirs

I'm not sure that's what you actually want? To literally do what you're saying there, it means not removing non-*.go files from package directories that are otherwise unused, and even preserving potentially empty directories. This specific phrasing is why I inferred you were looking for dead/unreachable code elimination.

However, from this:

This is possible without projects source code analysis and provides one-to-one relationship between the disk state and glide.yaml content (independent of the projects source code).

And judging from what the docker script does, I think you're actually just asking for a tool that removes whole repositories that are present under vendor/, but not specified in glide.yaml. If that's the case, that is very much so the plan. There's an argument to be made that it should be done by all commands, being part of the guarantee of the "blessed state" originally described. If it's not, though, then that's exactly what prune would do, without any --aggressive option. In fact, I can't think of anything prune would do other than this, so I'm not sure what work you were imagining prune would do WITHOUT --aggressive.

sdboyer commented 9 years ago

Oh, I also meant to note in my original response - carving up stuff under vendor/ is also really out of scope for glide because that would make for dirty trees from the VCS' perspective, which would considerably complicate the interactions with them. No bueno.

imikushin commented 9 years ago

Sorry for not expressing my use case clearly enough. Here it is, in full. Maybe I'm doing something you guys think is stupid. If that's the case, I want to know what it is. Here's the use case:

As a maintainer of a project written in Go programming language, using quite a few of source code dependencies (one of which is docker and is pretty large), I need to manage these dependencies. The project has a policy (imposed by the project owner) to store all source level dependencies under version control.

I don't need to store these dependencies entirely. Storing just the relevant parts is fine, as long as the tool ensures repeatability: given the same dependency specification, the tool should put the same content into the ./vendor directory.

Now, minimizing dependencies' footprint is important because most of these files are just taking up space and bandwidth, slowing down the project checkouts and builds.

So, I need a tool to retrieve the specified dependencies and strip the unneeded parts.

I'm currently using my own glide fork to work on RancherOS:

  1. Edit glide.yaml
  2. Run ~/bin/glide up -u --quick --skip-gopath --cache --delete. This puts all dependencies to ./vendor dir. To my surprise --delete only works before retrieving dependencies, so I also need to:
  3. Run ~/bin/glide del, which is my custom command, based on a patched version of glide's delete.go

This is good enough, but glide can do better:

By removing packages I mean removing all files from the corresponding directories (except LICENSEs and READMEs) and then remove empty dirs as well.

Why not just remove the dirs with rm -rf? Because, we might need a subpackage, but not its parent package, e.g. like this:

- package: github.com/docker/libcompose
  version: 0919e089edff3ba95d84119228f46d414882ded1
  subpackages:
  - cli
  - docker
  - logger
  - lookup
  - project
  - utils

In this particular case I'd like ./vendor/github.com/docker/libcompose dir to only contain cli, docker, logger, lookup, project and utils sub-directories and maybe the license notice and a readme, but nothing else.

mattfarina commented 9 years ago

Anyone following along, now is a good time to test the feat/lockfile branch. The init, update, and install commands are all in working shape. Docs do still need updating.

mattfarina commented 8 years ago

This should remain open until purge has been implemented. The other elements are ready to be tested.