constabulary / gb

gb, the project based build tool for Go
https://getgb.io/
MIT License
2.15k stars 147 forks source link

[rfc] gb build/test automatically download missing dependencies #536

Open davecheney opened 8 years ago

davecheney commented 8 years ago

This rather blandly titled issue is a placeholder to discuss the changes to gb for the 0.4 series, which the roadmap hints "improve gb vendor".

Backstory

gb cares about reliable builds, a lot. Giving Go developers the tools they need to achieve repeatable builds was the motivation for developing gb, and the main way gb does this is via the $PROJECT/vendor/src directory. A bit later gb-vendor came along when it was clear that users wanted tooling to help them manage their vendor'd code.

But it also clear that not everyone is comfortable with actually having copies of the source in their tree; they would rather have a manifest file that explains where to get those dependencies on request. This catalyzed around the gb vendor restore command which I now regret accepting as it confused the message that gb's reliable build story is based on vendoring. See #498.

To be very clear, the answer for how to get the most reliable builds with gb is always to copy your dependencies into $PROJECT/vendor/src. But sometimes there is an argument for trading reliability for convenience, this is the motivation for this proposal.

Proposal

I propose that during package resolution (which happens for build and test, but may happen at other stages in the future) once the standard search locations have been exhausted (see below), a per user cache of source code is consulted, and if the package is not found in the cache, we attempt to fetch it using the standard gb vendor fetch lookup rules.

Importantly, this is not a proposal to automatically invoke gb vendor fetch, but a separate mechanism that is invoked once all the per project source locations have been exhausted.

TODO:

Maintaining a cache means establishing rules for when the cache is updated. My proposal for this is updating the cache, ie fetching from upstream if the entry is present in the cache, should be off by default.

This means adding a new flag, probably -u which effectively marks anything in the cache as stale and forces the upstream to be consulted. This could be a potentially very time consuming operation.

TODO:

The package cache, or more correctly a cache of source code, is a location where dependencies can be stored and referenced for gb projects. This cache is per user not per project.

TODO:

The search order for packages is currently:

With this change, rather than terminating if a package could not be found, the cache will be searched, and if the package is not found, an attempt to fetch it from it's upstream (as defined by gb vendor fetch, which itself is defined by go get's rules) is made, and the cache searched again.

The proposed search order could be considered to be roughly:

The key take away is that source code higher up the search order takes precedent. If you need a specific revision of a dependency, you should vendor it to $PROJECT/vendor/src.

One more thing

With this change, I've just reinvented go get, golf clap.

Well, yes, and no. It is correct to say that in the first implementations of this design fetching from upstream would have the same problems as go get has currently. But I believe that gb has all the pieces to improve on this.

In golang/go#12302 I made a proposal for a simple release mechanism for Go packages; use semver tags for package releases.

I propose that gb can use this information to let project owners specify the version (as defined in proposal/design/12302-release-proposal.md) or a version range for a dependent package.

Arrgh, Dave, you've introduced a manifest file; twice!

True, to avoid pulling from head, you'd need to use a manifest file, but just like gb vendor, it's totally optional, if you don't want gb to fetch upstream, make sure the source you need is in $PROJECT/vendor/src.

TODO:

gosuri commented 8 years ago

I like the convince of downloading missing packages but bit confused about repeatability for the team.

Explicitly downloading dependency to vendor/src, in my mind, is the single most valuable feature that helped us to share dependencies reliably with the team for a large project. The thing i'm concerned about is how do you communicate when a dependency is in cached locally and not vendored.

davecheney commented 8 years ago

@gosuri yup, that's a good point. It sounds like as a prerequisite to this work I need to add a command to gb that print out detailed information of where each package in a project is resolved from.

Additionally this sounds like, as an existing user, you would prefer this feature to default to off.

davecheney commented 8 years ago

@gosuri but don't think of this as "a missing dependency is going to fall back to an unknown pull from upstream's head", the goal is to drive people towards the design in golang/go#12302. If you could specify in some kind of manifest file the version (see 12302) of a dependency, rather than vendor a specific hash, would that be good enoughtm ?

tianon commented 8 years ago

Since you asked for comments explicitly, I'll add my vote for this fetch/cache to be explicitly opt-in if it's added; I'd be very surprised if I did a "gb build" and it started downloading instead of throwing an error (maybe with suggestions about how to get the dep I'm after).

I can't see myself using the feature described here, and honestly see it as contrary to gb's main value proposition. :innocent:

davecheney commented 8 years ago

@tianon yup, i get that. IMO I think everyone, myself included, is shell shocked by the belief that pulling from upstream would bring us right back to square one with go get.

But imagine if there was a way to pull a specific release version of a dependency from upstream if it was not already present in the project. I believe all the pieces exist to make that happen, although I admit it won't happen overnight; there will be some arm twisting required.

How about this middle ground. For the initial iterations, while there is no manifest file, and no real support for fetching a version of a dependency, this feature defaults to off. When support for fetching a missing dependency version is implemented, then the default could be changed.

davecheney commented 8 years ago

@tianon

I can't see myself using the feature described here, and honestly see it as contrary to gb's main value proposition. :innocent:

That's a fair comment, and this is on me for not being clear enough with the message. gb cares about reliable builds, and that's by vendoring into $PROJECT/vendor/src, but I recognise that there is a set of users who object to vendoring for various reasons - and to be honest, after working with gb for six months I count myself among them.

Vendoring is a effective, but crude hammer. You get repeatable builds, you get independence from upstream foibles, but, well, hammer.

This proposal doesn't take away that method of working, $PROJECT/vendor/src will always be supported in gb. This proposal is an attempt to ease people into the gb project model, without the biggest hurdle, which is setting up their vendor/src directory.

gosuri commented 8 years ago

@davecheney We prefer reliability over convenience. Although, use semver very religiously for public projects, there are countless instances where version numbers would be stale internal projects. We resorted to injecting commit hashes in version commands to be explicit. Example:

$ walker version
walker 0.0.3 (8d5f2f4a65e871e8156fb7617f7c8c282f32f544+CHANGES)
davecheney commented 8 years ago

@gosuri i understand that completely. The gb party line is, if you don't trust the upstream, then you should vendor the dependency.

gosuri commented 8 years ago

@davecheney yeah.. you'd be surprised how many very "prominent" projects don't really understand the <breaking>.<feature>.<fix> model.

davecheney commented 8 years ago

@gosuri I understand the reservation about semver, however this really isn't about semver, this is about trusting your upstream to do a good job. If you don't trust them -- vendor; but really you're forking them because, well, you don't trust them.

gosuri commented 8 years ago

@davecheney Exactly my point. We use semver religious once the project reaches publishable state, working towards getting better at early stages as well.

Very unfortunate to see several prominent (don't want to single out) projects ship breaking APIs as "feature" releases. I learned not to trust upstreams by default. My current upgrade flow right now is something like:

$ git checkout -b upgrade-foo
$ gb vendor delete foo
$ gb vendor fetch foo -tag ...
$ ... # test
$ git checkout master
$ git merge upgrade-foo # if checks out
$ git branch -D update-foo 
ChrisHines commented 8 years ago

@davecheney

but I recognise that there is a set of users who object to vendoring for various reasons - and to be honest, after working with gb for six months I count myself among them.

I am interested to hear about this evolution in your thinking.

dahankzter commented 8 years ago

I am also interested in hearing how vendoring came to feel as a hammer for you. I echo most of the objections but with off by default it might work.

Maybe have it as an "experimental" feature?

Is most of the motivation coming from wanting to promote a standard versioning scheme? I have previously been very much in favor of it before but have warmed to vendoring. I get nostalgic. It reminds me of sending code as patches and tar balls over email. Can this help with #49? It relates to versions of dependencies at least in part right? Then why not try it? Or make it more focused and make a special manifest file that specifies this. Please by all that is holy make it simpler than poms...

dahankzter commented 8 years ago

I mean there is no need for auto download as a selling point if what is really tested is versioning.

davecheney commented 8 years ago

@ChrisHines

I am interested to hear about this evolution in your thinking.

I've always believed that vendoring code is the least worst option (insert Churchill quote). I think I used those words in April last year when I introduced gb. Vendoring does have some nice properties, which can be divided into two categories

Upstream issues; github going bust, upstream closing the source of their code, internet breaks when you're trying to build the release.

Go get issues: no way to fetch a specific release version, no way to fetch a specific vcs revision, no way to release a specific revision, no way to say one package depends on a version/revision of another. No way to make sure your coworkers update their local copy when you push a bug fix, ...

gb is a reaction to both of these sets of issues, but focuses on the second set, because we've had all those and more working on Juju at Canonical. It's nice that vendoring also solves the first set of issues, and I want to recognise that the core group of users who were attracted to gb because it gave them a workable methods to vendor dependencies a year ago. I want to make it clear that $PROJECT/vendor/src/ is not going away.

With that as a background, and with a window seat on the GOVENDOREXPIERMENT debate, and the various debates around vendor-specs, I'm concerned that vendoring is being used as excuse to not address dependency management problems in the Go package ecosystem.

Without rehashing the extremely long debate on proposal 12302, Go packages have no notion of version, only vcs revision, and whether you download your dependency from the origin, from a central repo, or vendor them into your source tree, I believe that the properties that a release process give to Go packages are valuable.

Lastly, I've also been told by people who work in more regulated environments that vendoring code constitutes forking and raises the spectre of open source licence fights. I am not a lawyer so I cannot comment on these details, but suffice to say if they say they cannot vendor code because $LAYWERS, I believe them.

davecheney commented 8 years ago

@dahankzter thanks for taking this discussion off twitter. I'll try to respond to each of your points in order.

I am also interested in hearing how vendoring came to feel as a hammer for you.

Please see my response to @ChrisHines

I echo most of the objections but with off by default it might work.

Thank you. This is a just a proposal at this point, and I value your feedback and will integrate it into this proposal.

Maybe have it as an "experimental" feature?

Nope, sorry. I'm even more militant than the Go team on options, we either do it or not. In the context of your previous point, this is really a debate about if the package is missing from the project, we fall back to downloading or not. Which I guess is an option, sorry, i'm a hypocrite.

Is most of the motivation coming from wanting to promote a standard versioning scheme?

Please see my response to @ChrisHines

I have previously been very much in favor of it before but have warmed to vendoring. I get nostalgic. It reminds me of sending code as patches and tar balls over email.

I'm not really sure how to respond to this, could you please clarify for me.

Can this help with #49? It relates to versions of dependencies at least in part right? Then why not try it?

Not directly. I know that there are people who want to use gb to write libraries, not projects, but this proposal is not aimed at addressing that. Maybe it'll help, but that would only be a side effect.

Or make it more focused and make a special manifest file that specifies this.

Could you please clarify what you mean.

Please by all that is holy make it simpler than poms...

I've used maven in the past, and have some experience with poms, but I would really appreciate it if you could be very clear about what you do not want to see; ie the bad things about poms, as I'm sure you will not be the only person to mention maven.

Thanks

Dave

davecheney commented 8 years ago

I mean there is no need for auto download as a selling point if what is really tested is versioning.

Could you please expand on this, it sounds like a very important point and I would like to respond to you properly.

ChrisHines commented 8 years ago

@davecheney Thanks for the detailed explanation. Like you, I believe in the value of a release process for Go packages. I also believe that different projects legitimately have different requirements for managing dependencies.

My feedback:

tianon commented 8 years ago

@davecheney yeah, I suppose that makes sense -- looking at it from the perspective of something akin to requirements.txt or Gemfile seems totally logical (and helps me swallow a bit and realize this really does make a lot of sense), and I suppose if we had a reasonable way to fetch specific versions from upstream on the fly, I'd be fine (especially if the solution became popular enough that upstreams didn't play games like renaming/removing tags, changing tags, or swapping repos around willy-nilly because it'd break the usage); you're definitely dead-on that the reason I like the crude hammer that is vendoring is because upstreams are typically unreliable and fetching methods are crude :smile:

calmh commented 8 years ago

Sounds good to me. I'm not personally that interested in the package cache as it pertains to unversioned dependencies (current go get style), but like it as a way to implement 12302/semver "on the side" with the hope of it becoming the defacto standard in the community.

I'm tagging my Go packages in anticipation. :)

seh commented 8 years ago

We need gb to be able to fetch a specific revision from a VCS. Today gb assumes that the desired revision is the tip or head revision, so we have in implicit label for it, but suppose that we could specify the desired revision instead.

Even without semvers available for most projects, I expect that such semvers will be non-surjective (and possibly non-injective) mappings to VCS revision IDs (in Git, commit hashes). If we have a dependency pinned by semver, we need a mapping from semver to VCS revision. Treatment of semvers is a layer on top of treatment of VCS revision IDs—a layer of indirection—just as Git treats tags as an alias layer on top of commit hashes.

I'd like to see gb record a manifest of vendored dependencies, in increasing order of specificity:

Whether the semver or the alias is more specific is hard to say; some VCS aliases float over time, whereas I expect that a semver will not. But it's possible that a semver could point at an indelible VCS alias (a "release tag").

With that in place, running gb vendor fetch would fetch the code available for the tip VCS revision and record that fetched revision in a manifest. At that point, deleting the fetched (cached) code but retaining the manifest would allow one to later fetch that same code again on demand—assuming that the server and resources remain available. We wouldn't need to check the fetched code into our own source code repository unless we didn't trust the upstream server. We'd only be checking in the most precise specification possible (apart from the code itself) of what code to use when needed.

I would also then expect an interface to be able to widen or narrow the desired version of such a vendored dependency:

How gb would map from semver to a VCS revision remains unspecified. However, I think that's fine, because you'd need to build the stuff underneath first before you could even use such a semver mapping.

StabbyCutyou commented 8 years ago

So, I've been thinking for a long time now, one of the biggest issues I have with the way go handles dependenices is the lack of a mechanism to define and "pin" a library to a version of a dependent library.

I do not believe vendoring solves for this adequately, although it is a start. What if your service X depends on Library Y and Library Z. Library Y also depends on Z, but has a stricter requirement around what version of Y it uses - but, and this is important, not an incompatible restriction ("Diamond" dependency management is still a problem, but one that can be solved by individuals, and not one holistic tool that makes magic happen - imo).

You'd need something to reason about: "Ok, I need ~> 1.5 of Library Z, but the sum of my dependencies need ~> 1.5.6. Therefore, I ultimately depend on ~> 1.5.6"

As I was unfortunately unable to make it to the consulate in brussels to have a conversation, per our accord on Twitter, I instead spent a few hours mocking up this approach:

https://github.com/StabbyCutyou/a_conversation_with_dave/pull/1

I'll note the above is A: not complete and B: not fully thought through, but as I've been thinking about this for a while now, I wanted to at-least spend some time trying to implement a partial solution, to try and better understand the intricacies of a full solution. As I noted in the Readme, my opinion on how to handle version management would be as follows:

Each library or service would define a "pin" package in it's own executable that would interact with something like pinner (the package from my PR link). You would register the dependencies by their regular value (so, the github urls for example), and the version + constraint (~>1.0, <3.0, etc).

The current implementation simply goes and fetches them, trying to find the right version that meets the constraint, but a proper implementation would need to do the following:

  1. Resolve all dependencies, break on any errors
  2. For each of those dependencies, check for a "pin" package, resolve those 2a. Continue doing this until you run out of "pin" packages, or until you hit a breaking issue
  3. Build a tree of dependencies, such that you know both what each library depends on all the way down
  4. From this tree, coalesce a list of unique dependencies with their version constraints 4a. If an incompatibility is found, break and report the error
  5. Now, feed this list to the current process, where it will actually retrieve all dependencies, and resolve to find the version-tag that best fulfils that constraint.

The files would initially be checked out to a "staging" area, before being actually checked out in the projects go-path.

I believe this approach would work well with the gb approach of "Each project has it's own private GOPATH", but not as well with the vanilla way GOPATH works. I have some notes in the code itself around some ideas to fix that, but they're wholly out of scope for this discussion.

davecheney commented 8 years ago

@ChrisHines

My feedback: Definitely opt-in to start, and keep it that way until it feels like most people would want it on nearly all the time.

Yup, that is crystal clear from the feedback.

To prevent the package cache from feeling too magical I would want tools to list what's in the cache and gb build should be able to report the location and version of each package included in the build. I can see myself using this so that I don't have to vendor my own internal libraries into each of my internal projects. The churn when a library is co-evolving with the project is the one aspect of vendoring that I find most annoying.

Noted. This feels like another facet of #493, which I do want to tackle sooner rather than later, but as my preference is to make this additional information output all the time, finding the right output format will be a complex business. Failing that a replacement of the undocumented gb depset command will be in order.

How do we make builds 100% repeatable when not vendoring all dependencies?

I don't think this is possible. This proposal trades reproducibility for convenience. I don't have enough experience to judge how much of the former would be given up, and what would be the implications of this.

I think this is somewhat mitigated by the general feeling that this should only be opt in, and so project owners can choose to make that tradeoff themselves, it will not be forced on them by gb.

davecheney commented 8 years ago

@seh thanks for your comments

We need gb to be able to fetch a specific revision from a VCS. Today gb assumes that the desired revision is the tip or head revision, so we have in implicit label for it, but suppose that we could specify the desired revision instead.

gb vendor fetch provides most of this today. Can you explain a bit more about what is missing from your point of view.

Even without semvers available for most projects, I expect that such semvers will be non-surjective (and possibly non-injective) mappings to VCS revision IDs (in Git, commit hashes). If we have a dependency pinned by semver, we need a mapping from semver to VCS revision. Treatment of semvers is a layer on top of treatment of VCS revision IDs—a layer of indirection—just as Git treats tags as an alias layer on top of commit hashes.

I'm assuming you meant with semvers available. If so, this sounds like the argument for a lockfile in addition to a file specifying the version or range accepted.

I have two comments in response to that:

  1. If you need a specific revision, you should vendor it and be done.
  2. It's clear that I made a mistake in suggesting that a version number or a range would be acceptable for dependency resolution. I'm going to drop the idea of a range.

I'd like to see gb record a manifest of vendored dependencies, in increasing order of specificity:

Isn't this what gb-vendor does ?

With no version (implicitly requesting the tip revision) With a VCS revision alias (in Git, a tag) With a semver With a specific VCS revision (in Git, a commit hash) Whether the semver or the alias is more specific is hard to say; some VCS aliases float over time, whereas I expect that a semver will not. But it's possible that a semver could point at an indelible VCS alias (a "release tag").

gb-vendor does this with the exception of recording the tag. This is an oversight from the version 0 manifest and will be corrected at some point in the future.

With that in place, running gb vendor fetch would fetch the code available for the tip VCS revision and record that fetched revision in a manifest. At that point, deleting the fetched (cached) code but retaining the manifest would allow one to later fetch that same code again on demand—assuming that the server and resources remain available. We wouldn't need to check the fetched code into our own source code repository unless we didn't trust the upstream server. We'd only be checking in the most precise specification possible (apart from the code itself) of what code to use when needed.

You've raised a really important point about the interaction between gb-vendor and this new proposal. I had envisaged that the flow of dependencies would work something like this

a. gb build fetches dep from it's tip not going to do this b. user chooses to pin dep to a release version c. user chooses to vendor dep at a revision

You're suggestion that the flow of control works the other way around, you'd start with using gb vendor to fetch something, then once you're satisfied with it's stability, move to pinning that release version the depfile and removing it from $PROJECT/vendor/src.

I agree that this is a valid way of working, but I worry that is suffers from a chicken and egg problem -- if everone vendors random revisions of code, there is no pressure on upstreams to do proper releases, so you can never move to a pinning to a specific release.

How gb would map from semver to a VCS revision remains unspecified. However, I think that's fine, because you'd need to build the stuff underneath first before you could even use such a semver mapping.

I'm proposing the mapping would be what I proposed in 12302, tag repositories with a release number.

Thank you for your comments, I'm sorry if I have not answered all of them in detail.

davecheney commented 8 years ago

@StabbyCutyou thanks for your comments. In responding to them a lot of them do not appear to be directly related to this proposal so I'm going to skip over them -- sorry for moving the goal posts. I do want to hear you ideas on dependency management for Go projects, but I want this thread to stay on topic.

You mentioned in your comment about libraries -- gb projects are not libraries. I know people want to use gb for writing packages (libraries), but to make the problem space tractable gb is only targeting projects -- code that is not imported by other code.

With that said, while libraries might have a requirement to say "I can work with version 1.x of log15", projects will want to pin their dependency down to at least a specific version, "I want to use 1.9.2 of log15 for this project", and possibly to pin to a specific revision, which is addressed by gb's vendoring support.

The intention is to record in the project the version (as defined in proposal 12302) of a dependency and that is used to fetch the package when it is missing. I hinted above that there might be range support, I'm walking that back right now, it was a mistake. Versions must be absolute.

So, sorry, I'm not looking to solve the wider problem of dependency management between go libraries today, it's just too large of an elephant to eat in one sitting. The goal here is provide gb project owners with more tools to build Go applications reliably if they cannot or choose not to vendor.

I believe this approach would work well with the gb approach of "Each project has it's own private GOPATH", but not as well with the vanilla way GOPATH works. I have some notes in the code itself around some ideas to fix that, but they're wholly out of scope for this discussion.

The fact that GOPATH forces the user to only have one copy of a dependency in scope at once makes it pretty unworkable to have two projects in a single GOPATH with divergent dependencies. Canonical wrote godep (singular) to add some automation to handle this. I wrote gb to try to do better.

davecheney commented 8 years ago

Thanks to everyone who contributed to the discussion so far. Based on your feedback I'd like to propose the following changes to the proposal:

  1. This feature will default to off. This is pretty much a no brainer. You expect gb to be deterministic, that has to continue to be my primary concern. Message received loud and clear.
  2. When enabled, this feature should not fall back to fetching an unspecified revision if none is specified, it should only fetch a specific release version.
  3. The notion of the project owner being able to specify a version range, not just a version was a mistake as it introduces ambiguity into the dependency resolution. Message received loud and clear, there will be no support for version ranges.
  4. As package resolution is becoming more complex, and there is an operational requirement for the project owner to actively know where a dependency is being satisfied from, several people asked for a tool -- probably a new subcommand -- that will list where each package is coming from. I'm going to take this as a separate issue.

Taking 4. as a separate TODO, I propose the following:

This feature is off by default, and is enabled by the presence of information that maps a dependency to a release version. Or to put that more concretely

a. dependency information is recorded in a file b. the presence of that file activates this feature; no file, nothing will be fetched c. dependencies are only fetched if they are present in the file, if they are not present, then nothing will be fetched, this is a partial application of the previous point. d. dependencies are specified by a release version, version ranges are not supported. If you need more control, then you must vendor.

Comments ?

Thanks

Dave

seh commented 8 years ago

Can you explain a bit more about what is missing from your point of view.

I confess that I don't know much about what gb vendor does today. I've used gb vendor fetch a few times and been satisfied with what it did the first time. I haven't had to revisit those dependencies to update them or reconcile them with some other edge in the dependency graph. (Detour: I still don't understand whether gb accommodates a dependency graph, as opposed to a tree, where the same package may be vendored as a dependency of several libraries, each with its own copy.)

I'm assuming you meant with semvers available.

No, I really did mean without. Restating, given the standstill with 12302, I'm assuming that we don't have semvers available today, but that shouldn't keep you (or us) from working on dependency version management. When semvers become available, they can sit on top of what you can build today.

Isn't this what gb-vendor does ?

Perhaps, but I don't know whether gb-vendor allows a user to express a requirement: I want package P at version V. I figured that one can ask gb-vendor to get the latest version of package P, and gb-vendor would remember which version it got so that in the future it could determine if a newer version is available. That's not the same as including such a package-to-version mapping in the user-visible data model.

if everone vendors random revisions of code, there is no pressure on upstreams to do proper releases, so you can never move to a pinning to a specific release.

True. It assumes that the upstream maintainer is not diligent enough to assign versions, but it does accommodate downstream experimentation as a form of QA. That is, if I consume package P, grab its latest version, and find it works acceptably, perhaps I'd pin the version I got. Then I find a bug in it, report it and maybe even contribute a fix that gets accepted, and so I'd figure out which newer upstream VCS revision contains my fix, and tell gb to start using that version instead. I'm not looking to track the incremental upstream progress. In lieu of deliberate semvers or even releases, though, I can still point at particular upstream VCS revisions as being significant points of reference for my project.

ardan-bkennedy commented 8 years ago

I’m really happy to see this idea of a package cache and I think it comes at the right time. I have run into several companies that because of compliance issues must manage their own repo of dependencies. They need to make sure that they are compliant with licensing. Developers must get approval to use a package. They also need to manage changes to the dependencies for the entire company. Understanding how widespread the usage of a dependency is also is important.

Because of this, vendoring is not an option and these dependencies can’t be saved with any project.

From a build perspective we only need to make sure a dependency is properly loaded on disk so the build can take place. I would like to state import path rewriting is never an option.

This is complicated because where are you going to load these dependencies on disk?

Dave is saying this cache will be a location for a user. I come from a perspective of working on different client projects. This will cause me problems because it will result in the cache folder being shared across clients. Maybe an environment variable can be used to help these types of situations. Maybe there is another convention based on how projects are named or a special folder inside the project that can be used as convention to find the cache folder? Something to think about.

StabbyCutyou commented 8 years ago

In responding to them a lot of them do not appear to be directly related to this proposal so I'm going to skip over them -- sorry for moving the goal posts

On the contrary, it's me here who moved the goal posts a bit. My concept here is definitely out of scope. I could respond to some of your points and continue to explain the design a bit, and would gladly have the conversation, but I feel that it's likely a distraction from your actual RFC.

davecheney commented 8 years ago

@seh

Can you explain a bit more about what is missing from your point of view.

I confess that I don't know much about what gb vendor does today. I've used gb vendor fetch a few times and been satisfied with what it did the first time. I haven't had to revisit those dependencies to update them or reconcile them with some other edge in the dependency graph.

That's ok, gb vendor fetch does pretty much what it says on the tin; it checks out some code, and copies it into your project.

(Detour: I still don't understand whether gb accommodates a dependency graph, as opposed to a tree, where the same package may be vendored as a dependency of several libraries, each with its own copy.)

Both, but that probably isn't helpful. Go dependencies are a graph, not a tree; it starts from your main package, branches out, then concentrates on the stdlib and ultimately the runtime package.

If your question is about the diamond dependency problem; gb solves that by the fact that there can only be one copy of pacakge's source code on disk in a project. There cannot be two different copies of a package's source on disk in the same place at once.

I'm assuming you meant with semvers available.

No, I really did mean without. Restating, given the standstill with 12302, I'm assuming that we don't have semvers available today, but that shouldn't keep you (or us) from working on dependency version management. When semvers become available, they can sit on top of what you can build today.

I don't understand what you mean by "semvers", it's just a convention, one that projects are actually doing anyway (check out core os, docker, etc). The release process I proposed in 12302 is really the simplest possible; git tag .... The Go team decided to punt on it, but that doesn't mean it cannot be a defacto standard -- and I think it should.

Isn't this what gb-vendor does ?

Perhaps, but I don't know whether gb-vendor allows a user to express a requirement: I want package P at version V. I figured that one can ask gb-vendor to get the latest version of package P, and gb-vendor would remember which version it got so that in the future it could determine if a newer version is available. That's not the same as including such a package-to-version mapping in the user-visible data model.

Sadly there is no way for a Go author to record that information. I mean, it's not physically impossible, but there is no convention, or standard do to it, so there is nothing gb vendor could do to recover that information, it simply does not exist.

edit: I may not have understood your question, could you please restate it.

if everone vendors random revisions of code, there is no pressure on upstreams to do proper releases, so you can never move to a pinning to a specific release.

True. It assumes that the upstream maintainer is not diligent enough to assign versions, but it does accommodate downstream experimentation as a form of QA. That is, if I consume package P, grab its latest version, and find it works acceptably, perhaps I'd pin the version I got. Then I find a bug in it, report it and maybe even contribute a fix that gets accepted, and so I'd figure out which newer upstream VCS revision contains my fix, and tell gb to start using that version instead. I'm not looking to track the incremental upstream progress. In lieu of deliberate semvers or even releases, though, I can still point at particular upstream VCS revisions as being significant points of reference for my project.

Yup, that's the gb vendor story, use that.

davecheney commented 8 years ago

@ardan-bkennedy responding to some of your comments.

From a build perspective we only need to make sure a dependency is properly loaded on disk so the build can take place. I would like to state import path rewriting is never an option.

For the perspective of gb, assume that this feature means $PROJECT/vendor/src is a lazy populated cache. If that description doesn't help, please ignore it.

This is complicated because where are you going to load these dependencies on disk?

You mentioned import rewriting, you know how dimly I view that, so assume I'm not going to slip and let that happen. This of this cache as extra GOPATH entires that are populated lazily.

Dave is saying this cache will be a location for a user. I come from a perspective of working on different client projects. This will cause me problems because it will result in the cache folder being shared across clients. Maybe an environment variable can be used to help these types of situations. Maybe there is another convention based on how projects are named or a special folder inside the project that can be used as convention to find the cache folder? Something to think about.

Two things, the cache is of your project's dependencies, not your projects' code. The names of the items in the cache follow the usual rules of go get, ie, they look like urls. I'm not sure there can be an overlap there.

davecheney commented 8 years ago

@lucidlime yup, that's basically how I see it. It's a cache keyed by <importpath,version> tuples.

davecheney commented 8 years ago

Thanks to everyone for your feedback. Based on your responses I've updated the proposal, the changes are:

Dependency file

As the unanimous decision is to only use this feature for downloading specific release versions (I keep saying this, because I'm making a difference between release version, which is something the upstream does, and vcs revisions, which is just some old git hash), there needs to be somewhere to store this information.

I am proposing that this information be stored in a file in the root of the $PROJECT, the presence of this file will enable fetching to the per user cache dependencies matching an import path prefix (this logic will need some tightening up) -- if the file isn't there, or the prefix isn't matched, then nothing will be downloaded, the dependency will be reported as missing.

I've sketched an implementation for the format of this file in #537. I welcome your comments.

seh commented 8 years ago

I don't understand what you mean by "semvers", it's just a convention, one that projects are actually doing anyway (check out core os, docker, etc).

I meant that some projects use semvers in tags (VCS aliases), but not every dependency out in the world does. This is different from the Maven world, where there's so such thing as an artifact that lacks a coordinate in the group-artifact-version triple space.

I may not have understood your question, could you please restate it.

When gb-vendor fetches a revision, surely it must be able to figure out which version it got, even when it just asked for the tip ("HEAD"). It can record this version internally. gb would then know that it has a copy of package P at VCS version V. That it can record that mapping internally is not the same as allowing the user to specify that mapping—perhaps by way of editing a file with a documented format, a la a Maven POM's dependencies element.

But I take your progress here to mean that you are allowing the user to specify that mapping. That's what I was suggesting you add as an evolution of what I think gb-vendor already does, promoting that internal bookkeeping to a user-editable specification.

davecheney commented 8 years ago

I meant that some projects use semvers in tags (VCS aliases), but not every dependency out in the world does. This is different from the Maven world, where there's so such thing as an artifact that lacks a coordinate in the group-artifact-version triple space.

Yup, the fact that there is no release process for Go projects is a big problem at the moment.

When gb-vendor fetches a revision, surely it must be able to figure out which version it got, even when it just asked for the tip ("HEAD").

gb vendor records the revision it cloned from, go packages do not have a version, because there is no release process for Go projects.

It can record this version internally. gb would then know that it has a copy of package P at VCS version V.

gb vendor records the revision, not the version.

That it can record that mapping internally is not the same as allowing the user to specify that mapping—perhaps by way of editing a file with a documented format, a la a Maven POM's dependencies element.

If you mean recording which tag of a package to checkout, then that is what the depfile proposal is. If a simple release process of tagging revisions with a semver like tag is established, then that closes the loop.

But I take your progress here to mean that you are allowing the user to specify that mapping. That's what I was suggesting you add as an evolution of what I think gb-vendor already does, promoting that internal bookkeeping to a user-editable specification.

gb-vendor's manifest file is editable, but it's probably easier to use the tool to do that, and I think that doesn't really answer your question.

Here is how I see this working.

  1. write some code that includes import "github.com/foo/bar"
  2. add a line to your depfile, github.com/foo/bar version=1.9.3
  3. gb now knows which tag to checkout, and will make sure that when the compiler goes to compile github.com/foo/bar as a dependency of your code, the copy of version 1.9.3 is compiled.
jalkanen commented 8 years ago

Tiny request: instead of a generic "depfile", it would be nice that the name of the file would have a direct relationship to the tool that should read it. This makes it more obvious to the user which tool they need to build the system and reduces the chances of future naming conflicts. Glide uses "glide.yaml", for example. (This is a pet peeve of mine - e.g. Maven uses pom.xml and Ant build.xml - WTF?)

For example, "gb-build" or "gb-debs".

davecheney commented 8 years ago

@jalkanen that's a fair request, nothing is set in stone at the moment, so now is the time to bikeshed. The depfile is only designed to hold the missing dependency information that is not available in the .go source file itself. I know others have plans for a much more comprehensive project metadata file, but I only need these few lines of information -- just the simplest thing that could possibly work.

If this feature is successful, probably one of two things will happen

  1. Other dependency tools will support the depfile
  2. gb will grow to a full project metadata file. If so, I don't want to steal the name for that file.

So, with those restrictions, what's a good name ?