golang / go

The Go programming language

https://go.dev

BSD 3-Clause "New" or "Revised" License

124.4k stars 17.71k forks source link

proposal: Vendor specification and experimental repository fetch code #13517

Closed kardianos closed 8 years ago

kardianos commented 8 years ago

Proposal: Vendor specification and experimental repository fetch code

Author(s): Daniel Theophanes

Last updated: 2015-12-06

Abstract

Establish a specification file format that lists dependency revisions and a package in the golang.org/x/exp repository that discovers, reads, and downloads packages at a given revision. Tools may continue to use other formats to generate this file.

Background

Many developers wish to specify revisions of vendor dependencies without copying them into the repository. For a case study I will bring up two:

A) https://github.com/cockroachdb/cockroach

B) https://github.com/gluster/glusterd2

(A) uses github.com/robfig/glock which specifies revisions for each remote repository in file in the project root called "GLOCKFILE". A partial list of the file is:

cmd golang.org/x/tools/cmd/stress
cmd golang.org/x/tools/cmd/stringer
github.com/agtorre/gocolorize f42b554bf7f006936130c9bb4f971afd2d87f671
github.com/biogo/store 3b4c041f52c224ee4a44f5c8b150d003a40643a0
github.com/cockroachdb/c-rocksdb bf15ead80bdc205a19b3d33415b23c156a3cf371
github.com/cockroachdb/c-snappy 5c6d0932e0adaffce4bfca7bdf2ac37f79952ccf
github.com/cockroachdb/yacc 443154b1852a8702b07d675da6cd97cd9177a316
github.com/coreos/etcd a423a55b142c2b9a82811604204cddbccd0a9cf9

(B) uses github.com/Masterminds/glide which specifies revisions for each remote repository in a file in the project root called "glide.yaml". This file contains:

parent: null
package: github.com/gluster/glusterd2
import:
- package: github.com/gorilla/context
  version: 1c83b3eabd45b6d76072b66b746c20815fb2872d
- package: gopkg.in/tylerb/graceful.v1
  version: 48afeb21e2fcbcff0f30bd5ad6b97747b0fae38e
- package: github.com/pborman/uuid
  version: cccd189d45f7ac3368a0d127efb7f4d08ae0b655
- package: github.com/gorilla/mux
  version: ad4d7a5882b961e07e2626045eb995c022ac6664
- package: golang.org/x/net
  version: b4e17d61b15679caf2335da776c614169a1b4643
- package: github.com/docker/libkv
  version: 93099f38de7421e6979983652730a81e2bafd578
- package: github.com/codegangsta/negroni
  version: c7477ad8e330bef55bf1ebe300cf8aa67c492d1b
- package: golang.org/x/sys
  subpackages:
  - /unix
- package: github.com/meatballhat/negroni-logrus
  version: dd89490b0057cca7fe3fa3885f82935dfd430c2e
- package: github.com/Sirupsen/logrus
  version: v0.8.7
- package: github.com/hashicorp/consul
  version: v0.5.2

I would like to point out a few features these tools provide:

Specify commands to fetch.
Specify repositories at a given revision.
Specify repositories at a given version.
Specify a sub-tree of packages in a given repository.

Right now each vendor tool specifies these same properties in different formats. A common tool cannot be built that reads a single file and downloads the needed dependencies. This isn't a huge burden on a dedicated developer, but for a user passing by who just wants to build the source quickly, it is an impediment.

Proposal

I propose specifying a single file format that will describe packages sourced outside the project repository. I also propose adding a packge to the golang.org/x/exp repository that discovers, reads, and optionally downloads third party packages.

Furthermore I propose using the specification found at https://github.com/kardianos/vendor-spec with one addition as the basis for this specification. The addition is:

Package []struct {
    ...

    // Tree indicates that the specified folder, along with all sub-folders
    // are required.
    Tree bool `json:"tree"`

    ...
}

Both the specification and the proposed package will be considered experimental and subject to change or retraction until at least go1.7. This process will be done with an eye to possibly adding this feature to go get.

Rationale

The vendor file format needs to be able to be read and written with standard go packages. This adds to the possibly that go get could fetch packages automatically.

Vendor tools exist today that download packages from a specification. They are just incompatible with each other despite using the same information to download the dependencies. If we can agree on a single format for tools to write to, even if it isn't the primary format for that tool, all tools and possibly go get can download dependencies.

Existing vendor tools and their formats don't always handle corner cases or different approaches. For example current tool file formats can't handle the case of vendoring a patched version of a standard library package (this would have been useful for crypto/tls forks for detecting the heartbleed attack and for accessing MS Azure).

I am proposing a file format that "govendor" uses. I'm not trying to put my own tool as central. Infact, "govendor" was built to validate the "vendor-spec" proposal. The "vendor-spec" has received significant external contributions and as such "govendor" has changed to match the spec (and will continue to do so).

Compatibility

This will be standardization of existing practices. There is no go1 compatibility issues. Existing tools can treat the specification as a write only file.

Implementation

A file format to describe vendor packages should be accepted when this proposal is accepted. Should this proposal be accepted a new package should be added to the "golang.org/x/exp" repository to support reading the vendor file and downloading packages. The author of this proposal offers to create or assist in creating this package. This would be created within 2 months of the proposal being accepted.

Risks

It would be ideal if other vendor tool package authors could agree to at least write to a standard file format informally and collaboratively. Indeed the largest risk is if vendor tools fail to write the common file format. However I think unless there is a tangible benefit (such as go get support) there will continue to not be a reason to collaborate on a standard.

Open issues

The proposed standard file format uses JSON, which might be better then XML, but harder to write by then something like TOML. Tools that want the vendor file to be hand created will be forced to generate this file from a different file.

The file format specifies packages, not repositories. Repositories can be specified by using the root path to the repository and specifying "tree": true, but it isn't the default for the format. Some people may take issue with that as they are used to or desire tools that only work at the repository level. This could be a point of division. From experience I absolutely love vendoring at the package level (this is what github.com/kardianos/govendor does by default).

freeformz commented 8 years ago

I share a bunch of those use cases with you, but I do not want to support the "I want to use a version range" use case. I've been there done that and it's a pita and usually ends up with very very narrow version specs and/or relying on the lock file. Maybe I've just had a lot of bad experiences.

Missing from that list are "I'd like to vendor a related tool (some other repos main package)". I see this mostly with tools like migrate and the like.

I also need to know a little about the user's Go environment (mainly ATM) the go version, so that I can build the code again using the same / similar version of go (think go primary version, minor should not be relevant).

Beyond that and generally speaking though my main use case is I want a tool to populate vendor/ with the code I need to build / run my app. That tool should help me maintain vendor/ over time, but allowing me to diff / list / update / remove / add to it. vendor/ should always be checked in (for various reasons). vendor/ should contain only the packages I need, not entire repos to keep sizes down. vendor/ should optionally contain the tests of the dependencies I place there so that they can be run.

My main response to this thread was a "+1" for a common library for tools to share, not a specific tool implementation. I have mixed feelings / technical motivations for converging on a specific tool. I am "+1" though about converging on a shared library and a vendor spec.

On Tue, Dec 29, 2015 at 3:46 PM, Matt Farina notifications@github.com wrote:

@freeformz https://github.com/freeformz I would be curious to hear your take on the use cases https://github.com/mattfarina/pkg/tree/mast%E2%80%A6 I'd previously worked with others on.

I also want to make it clear that I don't hold anything against Godep. I just know that it does not cover all the use cases developers have and want to see the go tool handler more of them than Godep covers today.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/13517#issuecomment-167901016.

Edward Muller @freeformz

freeformz commented 8 years ago

BTW: WRT version ranges and updates ...

If two packages (a+b) rely on the same separate package (p) and p makes a new release, you need to do integration testing when you upgrade your copy of p. Anything else it just hoping it will work. When doing ruby in the past (and other languages as well), I hated having to update a dependency because it didn't really matter what the released version number was in the end. Yes, the version number gives you a clue / hint wrt compatibility, but that's it.

Because of that I'm +1 wrt version numbers (semver specifically), but in the end it just doesn't matter that package a uses version 2.4.1 of p and package b uses version 2.4.2 of p. 2.4.5 of p was released and it needs to be re-validated to work with both the version of a and b that you have. I've had to patch/upgrade either package a and/or b to work with the new p (which for arguments sake fixes a bug that I'm experiencing) more times than I care to reflect back on.

Also, just because p released 2.4.5 doesn't mean I need to upgrade any code I have using package p to the new version. I may need to (because of the aforementioned bugfix example), but that's on a case by case basis.

After reading this entire thread again I can understand why the use cases call for version ranges. However I still do not believe they are necessary in go, when using tooling like godep/govendor/etc and the vendor/ directory to record and check in your dependent packages. I do not want to inflict this pain onto the entirety of go when we can avoid it.

freeformz commented 8 years ago

Note: using something like govendor + vendor/ you would only have a single copy of "p" in use anyway, so there wouldn't be a state where a was using p @ 2.4.1 and b was using p @ 2.4.2. When you vendored them you would pick a version to record+copy or the tool would error and you would have to resolve it.

sdboyer commented 8 years ago

Yes, the version number gives you a clue / hint wrt compatibility, but that's it.

Yep, that's all they do. Wanting any more from them is a poor expectation in the first place.

That does not make them unuseful. It just means that instead being a tool for the machine to use in making a final decision about what works, they're a tool to help you go through the process of figuring out what works.

you would pick a version to record+copy

Any proposed solution, including one with version ranges, still requires you to make such a choice. The only difference is that, when version ranges are permitted, the machine can help you with that choice; it needn't just make it for you and pretend everything is fine.

That choice is hard. It will always be hard. The real benefit of the ranges is the additional information such ranges can express when you're dependent on a library (or two, or three) where one of these situations arises. If all they have is the commit id that they're pinned to, you (typically) have no idea why they're pinned to that version, and so have to go in and understand their code well enough to figure out whether or not you can move them to a different version.

If, however, they can specify a version range, then you're taking advantage of the fact that their knowledge > your knowledge when it comes to their library. Again, these are difficult decisions - I can't understand why you'd want less information on hand to resolve them. Sure, it's possible that the lib author did a bad job and put in an unhelpful or incorrect version range, but:

It's still probably faster to figure that out and go from there than it is to go in blind with only the commit, and
Once you figure out a saner range for them to use, you can send a PR to fix their poor choice. And then, the open source cherubim break out into song, as you'll have done some work that potentially helps someone else when they're in this same difficult situation in the future.

freeformz commented 8 years ago

@sdboyer I'm not sure how version ranges helps the machine make that choice? Version ranges are not required for the machine to help me make that choice. If I can fetch the current code, or any arbitrary revision after the one I have recorded, via vcs then I have everything I need to determine compatibility (aside from a tool to do it). I do think people need to version their packages / repositories though as it will help a developer make a decision when that tool says that versions (or revisions if version information is missing) 1, 2 and 3 are compatible, but 4 and 5 aren't because the public interfaces / structs / function signatures that the developers code is using have changed.

sdboyer commented 8 years ago

Version ranges are not required via vcs then I have everything I need to determine compatibility

(pulling from the article I'm working on...)

You're right - they are not. Which is why I never said they were "required," or "needed." Necessity is not at issue; the question is whether or not they have supplemental information that makes it easier to determine compatibility.

I do think people need to version their packages / repositories though as it will help a developer make a decision when that tool says that versions (or revisions if version information is missing) 1, 2 and 3 are compatible

...well, this weirds me out, because this is pretty close to what I'm arguing for. Not sure where the communication is going wrong, so let me try to be more concrete. Say you have this dependency graph: diamond-fail

main is your package, A, B, and C are dependencies written by other people that you've pulled in. Because A and B are pointing at different versions of C, compatibility now needs to be worked out. If the authors of A and B haven't specified version ranges, then all the information you have is that they want different versions of C, and you have to go in and figure out a compromise. Once you've found an appropriate compromise - let's say C-1.0.4 - you have to test the integration of it all together for your particular main package.

If, however, A and B do provide version ranges for their dependency on C - because the authors of those packages are good stewards, and have figured out which versions of C they actually can work with - then that's an automated step a tool can take and either present the result to us, or just accept it:

diamond-auto

...at which point, we test. Same as if we figured out the compatibility on our own. The difference is, in making this decision, we get to benefit from the knowledge the authors of A and B have about their own packages' requirements (expressed in the form of those version ranges), which is almost certainly more than we know about it.

Crell commented 8 years ago

Quoting from the YC post that @sdboyer linked:

You don't pin versions in libraries, you pin them in applications. Almost every package in the dependency graph should be using version ranges. That avoids version lock and makes it easier to satisfy shared dependencies. It also means libraries don't all have to bump patches that just change dependency versions. But, in your application, in the root of the dependency graph, you pin everything that you're using right now. You check that in to ensure that everyone on your team and every machine gets the same versions of all of the dependencies. You get good code reuse and deterministic builds. This is exactly why Bundler separates out the lockfile from the Gemspec. It's unintuitive at first, but it works better than any other system I've seen once you grok it.

^^ That's the same point I'm making. It's the same conclusion that Composer reached for PHP. It's the same conclusion that Ruby reached. It's the same conclusion that the Glide team reached for Go, after fighting that conclusion for a while.

So if the languages that have built successful packaging tools have all reached the same conclusion (version range manifest file on libraries, pinned lock file on applications), what about Go is so inherently different that it shouldn't adopt a known-successful model? I don't mean about Go's status quo today (the status quot is obviously insufficient in this regard or we wouldn't be having this conversation), but what is intrinsic to Go that makes it so different?

That's what I don't get. When we know there's a model that's proven to work, gives everyone the flexibility they want/need, and and solves the problem space successfully, why wouldn't we go with that and benefit from everyone else's experience? (Inquiring baby Gophers want to know!)

freeformz commented 8 years ago

To me, in the end, it's about API compatibility, which computers are much better at figuring out than humans. With Go I believe we could use code analysis to determine API compatible versions and then let the developer choose which one to vendor instead of guessing. I'm fine with package meta data version ranges if it's just used by developers to provide hints on what to choose to do during a conflict: Will I need to edit my code, my deps code, their deps code or some combo of all of them.

In the end whatever standard gets adopted I'll have to support it, so here's hoping I've made my case.

Here is my original response: https://gist.github.com/freeformz/bd0d167dece99e210747. I aborted it though since I felt we'll just keep talking past one another.

kostya-sh commented 8 years ago

Another possible (though probably not very common) use case to consider is supporting binary dependencies (when the source code is not available). E.g. see #2775, #12186. I can imagine that such libraries can be even distributed as versioned zip files (similarly to jar files in Java).

kostya-sh commented 8 years ago

Two more related use-cases:

As an application developer I want to use a single repository for the application code and its client library (example: a database app and a client library to talk to this database). To build the application I want to use pinned versions of dependencies. For the client library dependencies I want to specify supported version ranges.
As a consumer of a client library (e.g. from the use-case 1) I want to vendor only the client package without the application code.

kardianos commented 8 years ago

@Crell I agree that applications should pin/copy and "libraries" (packages) should use version ranges. I agree that it is good if packages are released.

The difference is static analysis and GOPATH.

If the application should pin dependencies, then a design file isn't required for application, just the revision and specific version it uses.

If the "library" should contain version ranges it should have a version range for each dependency it uses. Now let me constrain the problem of version ranges into two categories: (1) "I want my package to use a compatible API", and (2) "I want my package to use all the required features it needs". (Remember your engineering design class, user stories must not contain a technical implementation or technical requirements). In Go you can denote API compatibility with either a unique import path or a "major" release tag. In order to satisfy compatibility, you cannot remove a feature or API once added. If package authors choose to give a unique path to each "major" release, the feature set is a function of the statically knowable API or just the revision time. If a package author just uses a tag, then all we need to know is what the version tag is currently to know the major version we need. And if we can just use the current version as a range spec, then that is machine discoverable, again removing the need for a human editable design file.

govendor already pins revisions for end applications. It would be simple to inform govendor that this is a "library" and just write down what is in the environment, including revisions and any versions package authors have provided. The versions it uses should automatically give any end application using it more first-hand information.

If a package author really had an exceptional amount of knowledge of a needed package version range or wanted to blacklist a particular version, it would be trivial to add a field with a well defined interpretation of that field for human use that could be presented to any down-stream users of the package.

The main difference between glide and what I'm proposing here is I'm letting the machine do more of the work. If you want to write the design file yourself for everything, that seems silly to me, but again fine. I continue to see no technical reason why we could not write versions and version ranges to the same file.

sdboyer commented 8 years ago

@kostya-sh - re: binary deps, my gut is that that's mostly, though not completely, orthogonal, as we've mostly been focused on getting and arranging source code here. I'd have to research that more, though.

If I'm understanding your first use case, then yep, that makes a lot of sense.

If I'm understanding the second use case, then I have the same question as I've asked before: why do you care about getting rid of code that the compiler is going to ignore, anyway?

@freeformz -

I think our positions are actually quite close, though yes, we're talking past each other. That's at least partly my fault - I was assuming the disconnect was over a lack of understanding as to what performing a resolution with a range would actually look like, and so was trying to clarify that. But, looking at your gisted response, I think maybe we've reached the kernel of it:

I do not believe that we should rely on some arbitrary meta-data when code analysis and revision history can determine which versions (indicated by semver tags; or failing that which revisions) satisfy every package's usage (your main, A & B) of the dependency's (C) API.

Sadly, code analysis + revision history cannot do that. (If they could, I'd agree with you - no question, they'd be the way to go) At best, they can determine that code is not incompatible, not that it is compatible. Annoyingly, these are different things. Here's an example.

All of which should be taken to mean that static analysis is certainly helpful, but not sufficient, for answering this question. Trying to make it sufficient brings you into a full-contact brawl with type theory (on which I'm still quite a newbie) as you try to compute type equivalencies. That's not what Go's type system was designed to do - but it IS a goal of Hindley-Milner-like type systems (of which some variant is used in langs like Rust, Haskell, OCaml, SML). So yes, Go is different: its type system is simplistic, but sound, and that was very much the goal (as I understand it). Trying to do too much more will be swimming upstream against the design.

The reason I advocate for version ranges is because they are a sufficiently flexible system to accommodate both the helpful insights from the static analysis you want, and the insights about logical compatibility that an upstream author is more likely to have. Run your tool, and encode the results, along with whatever else you know, into a version range.

We're talking past each other because we're imagining...well, I guess different workflows, though I'm loathe to call it that. The article I'm writing tries to break it down into necessary states and necessary phases, largely without regard for worfklow. We'll see how that pans out.

sdboyer commented 8 years ago

I continue to see no technical reason why we could not write versions and version ranges to the same file.

Yep, probably could. But "could" isn't the question. "Should" is the question.

kostya-sh commented 8 years ago

If I'm understanding the second use case, then I have the same question as I've asked before: why do you care about getting rid of code that the compiler is going to ignore, anyway?

As @mattfarina mentioned many times it is important that the spec addresses as many real use cases as possible. This is a real use case describing how some developers vendor their dependencies (vendoring sync2 package from vitess repository has been described in this issue discussion). Besides many golang.org/x repositories contain multiple packages that can be used independently (e.g. golang.org/x/net/ipv6 and golang.org/x/net/context).

I guess the main reason for doing this is efficiency. If I decided to check-in vendored dependencies to my application repository I would rather check-in 100kb client library than whole 10Mb of the source code. Additionally some VSCes (e.g. Subversion) are quite efficient at checking out a single directory (unlike Git). This might speed up build times in cases when vendored dependencies are checked out at build time.

It is also not very difficult to come up with a scenario when checking out the whole repository simply won't work. E.g. if I want to use two different packages from the same repository pinned to different revisions.

To be honest I don't care too much how the final spec will look like but it would be unfortunate if some of the use cases I described wouldn't be covered.

freeformz commented 8 years ago

@sdboyer

Sadly, code analysis + revision history cannot do that. (If they could, I'd agree with you - no question, they'd be the way to go) At best, they can determine that code is not incompatible, not that it is compatible. Annoyingly, these are different things. Here's an example.

Semver may not catch any of those, but code analysis would at least catch the v2 issue (as you stated code analysis can only tell me what's incompatible). Tests, as you, me and/or others have pointed out above would be required to catch the v3 issue, semver or not.

This is the crux of our disagreement AFAICT: You have faith in semver being meaningful beyond stating intent. I don't. In my mind semver is just intent and I would prefer to consider actual API changes and leave the rest to integration testing. We both view the world very differently apparently. Your article will be an interesting read for me I'm sure. :-)

I would love to get some sort of higher throughput (video / in person / etc) discussion wrt this issue. It's obvious that we all care deeply about it. Barring that I'll probably start bringing it up with every go developer I cross paths with.

mattfarina commented 8 years ago

I love the great conversation over the past couple days.

@freeformz I agree that some form of video, in person, or other better method of discussion would be useful. Let's see if we can figure out how to get that going. I'm happy to start figuring out the logistics of that.

To add some thoughts to the ongoing commentary:

@kostya-sh I agree with @sdboyer on the client and server in the same repo being an orthogonal issue. It's worth noting I've heard a lot of complaints when this happens on projects. In particular from those who want to consume the client without dealing with the server.
@freeformz There are a bunch of people and organizations who do not want to, for various reasons, check in dependent packages to their projects repo. Where you store packages is a slightly different problem from managing the versions you use. To make a widely usable solution we should support multiple methods of storing dependencies (in the parent projects repo and rebuilding from a lock file or other configuration file).
@freeformz I love the idea of parsing a codebase to know API compatibility. But, I don't think it's enough. I wonder about combining that with SemVer. I say it's not enough because of several reasons but I'll share one glaring example. I can't tell you how many times I've had to specify a range of ^1.2.3, != 1.3.4 because there's a buggy implementation sitting behind the otherwise compliant API. Looking at the API programmatically won't tell you this. Putting this information in some form of file communicates to a management program and application author consumer something a library author knows. Or, that to application authors working on the same codebase can communicate explicitly with each other. I don't see how parsing the API can do it all today but it's a great direction to start heading in. Do you see something else?
@kardianos The GOPATH can be a problem point as well. There are two that I'll share. First, it's a point of confusion for many new to Go. Helping people get past build issues because they misunderstand the GOPATH is one the single largest topics I spend time on with Go. Anything we can do to lower this entry level burden will be useful in on-boarding people to Go. For those who know Go it's often considered a pain point. That's why GB exists and it's gaining in popularity. If a solution here can help pull people back from that it would be useful in unifying the community. Second, if two applications being built and are in the GOPATH but rely on two different versions of a shared dependency it can be a problem to manage in the GOPATH. I, and many others, have experienced a problem where the right version is checked out for project A then I go to project B and do a build without updating the version only to have a problem.
@kostya-sh Pulling two different version from the same repo is generally considered a bad idea. It breaks any notion of atomic commits. There are diamond dependency problems. For example if you pull package A at version 1 and package B at version 2 while both A and B rely on C how do you determine the version to check out. At no point will this combination of package versions have gone through a test system. We should try to make something difficult for end users to screw up. Make the complexity simple for the majority. Unless there is something I'm missing?

In this problem space there are, at least, a couple distinct roles. Those who produce a package and those who consume it. If I were going to prioritize them I would prioritize the consumer slightly over the producer. What do y'all think of that?

kostya-sh commented 8 years ago

@mattfarina

@sdboyer "orthogonal comment" was about binary dependencies. This is something that currently doesn't exist in Go but might appear in the future. See #2775, #12186.
Two packages from the same repo do not have to be related. E.g. golang.org/x/net/ipv6 @ 0d2c2e17 and golang.org/x/net/context @ 3b90a77d2 - both come from the same repo. If I tested my application with certain pinned revisions them updating these dependencies separately is safer.

sdboyer commented 8 years ago

@kostya-sh - ah right, yes, sorry. I'm always going to struggle with splitting up an upstream repository, because it undermines commit atomicity of the upstream repository - and given how hard a problem space this is to build something both sane and usable, I like taking advantage of every bit of upstream information we can get.

I don't think golang.org/x repos following such a structure should be an example to follow. The Go authors wrote with a monorepo background, and a monorepo in mind, which is why we're having these problems in the first place. (The preceding comments here discuss this issue extensively).

I still struggle with the performance argument, though. It seems to me that exploring caching more would be preferable over carving up what amounts to generated code. Particularly for Go, where it's not necessary to fetch those packages beyond the build server (unlike an interpreted lang). And if the build server is ephemeral (e.g., hosted CI), at least some of them provide support for caching across ephemeral instances.

So, I can entirely see being convinced about it. But some (not all) of what I've seen about that so far seems to amount to complaints that "the tool doesn't currently do as well as I can manually." Well, of course not. But...cmon. Disk is very cheap. Network is relatively cheap. There is a point where it becomes preferable to eat it on those in order to reduce complexity of a real implementation.

@freeformz

Semver may not catch any of those, but code analysis would at least catch the v2 issue (as you stated code analysis can only tell me what's incompatible). Tests, as you, me and/or others have pointed out above would be required to catch the v3 issue, semver or not.

This is the crux of our disagreement AFAICT: You have faith in semver being meaningful beyond stating intent. I don't. In my mind semver is just intent and I would prefer to consider actual API changes and leave the rest to integration testing.

And even tests aren't sufficient, of course (Dijkstra: "Testing can only prove the presence of bugs, never their absence!"). But yes, you're absolutely right - semver ranges carry no guarantees whatsoever. They could be completely right, or completely wrong. What's important is that they're not mutually exclusive with static analysis.

If you're working and pull in a new project (A), which in turn has a dependency on another project (C) specified in a range, but you already had another dep (B) which also had a dependency on C, then when attempting to resolve the diamond, your tooling should ABSOLUTELY run static analysis on the A->C relationship to ensure that all the versions the semver range indicates are acceptable, actually are. Because yes - you shouldn't just take A's maintainer at their word. You'd be no better off than we are now in the unreasonable "just ensure tip always works" world.

So, let's say that main in my previous example is A, and C is the package offering the Square() func. Static analysis has knocked out v2 - great. You're left with staying with v1, or going to v3, or to some v4 (which isn't in my example, but it's easy to imagine one), any of which is permitted by the semver range.

So you go in, do the work, and figure out that A is actually incompatible with Cv3, but is compatible with Cv4.

This work you just did is extremely valuable. It should be recorded, so that no one ever has to do it again. Which you can do by filing a patch against A that further restricts the semver range to exclude v3. And now, when the next user of A comes along, they'll never hit that v3 pothole. They'll never even need to know it exists. (And the FLOSS cherubim sing.)

I think we all understand that there's a ton of uncertainty in software development. Superficially, semver may appear to just blithely ride that uncertainty train, or even make things worse. But all it's actually doing is taking a whole lot of awful, complicated shit that can happen, and providing a lens for seeing it all within a single field of view. (If you’re a fan of Cynefin, semver is an excellent example of an organizing system that moves a problem out of the complex space, into the complicated space.) While our individual builds must be reproducible and deterministic, the broader ecosystem will always be (in practice, from any one person's perspective) uncertain. All real software must constantly move back and forth between these two spaces. Semver facilitates a process by which we can inject more certainty into the ecosystem incrementally, as we learn more about it.

We both view the world very differently apparently.

Most people do :) Though I still tend to think, in this regard, maybe not so far off.

Your article will be an interesting read for me I'm sure. :-)

With any luck! Discussing over here has gotten me enmeshed in too much detail over there now, I think...I'm a bit stuck. Trying to pull back from the trees for the forest. Hopefully I'll have it done soon.

I would love to get some sort of higher throughput (video / in person / etc) discussion wrt this issue.

+1 from me.

kardianos commented 8 years ago

My understanding of where we stand is as follows: I would like to try to determine a single file that might allow different workflows to work together using a single format. People on the Glide team don't want that because it would be a suboptimal design, it would be different than other languages, and copying the version range from a tools design file to the standard lock file would "hugely complicate the tool".

Here is my response to @mattfarina 's use cases:

application_information: Name, description, keywords are already present in source. Homepage and license are not. However, I don't see how this affects vendoring. (Looks like something for a central package manager). Maybe we could improve godoc.org, but not in this issue.
consistent_team_setup_with_private: Maybe you can't do that today with go get or similar tools that just work off an import path. You don't need a package spec for that. If there is a problem with private repo creds, then we could leverage existing stored credential files or make our own. Doesn't appear to relate to this issue.
contact_owners: Unless you need to automate the contacting and message writing, then why can't you just read the README or private message their github profile? Doesn't appear to relate to this issue anyway.
license_scan: If you vendor something or use a dependency, you had better review the license first. Sarbanes-Oxley reviews will require more then just oh, it is MIT. You need to know copyright assignment and actual license full text. Doesn't really relate to this issue anyway.
lock_version; Ah, here we go. You want to allow non-main packages to declare their allowed version ranges on each dependency and pin an exact version for main packages. We have established this. I can at least roughly agree with this. I think the vendor-spec + additions talked about could write this information down.
managed_vendored_dependencies: When building specific versions, I agree use the vendor folder to put in specific versions of your dependencies. I agree, organizations can choose to either check them in or fetch them after cloning or updating a repo. That's great. The above use case needs a meta-data file. This I roughly the same, but needs support for it from the tools.
single_import: I agree. The tool govendor has always done this and govendor uses vendor-spec so no problem here.
use_specific_version: looks like a more specific version of "lock_version". Same response. I'm fine with a meta-data file having the ability to note version ranges for packages of some sort and exact revisions for main packages. Defining a single field on vendor-spec should allow this.
work_with_private_packages: looks like a partial duplicate of "consistent_team_setup_with_private". Similar response. The answer isn't more meta-data files I would argue. However, I don't think it is relevant to this issue.
working_with_forks: Can't do that in Go and unless you want to build your own build system distinct from "go" and "gb", it will never do that. But it doesn't relate to this issue anyway.

So of the user stories you wrote down that relate to this issue, I really don't have a problem with them. I continue to not understand why vendor ranges can't live in a vendor-spec (lock typeish) file for those who wish to use them.

freeformz commented 8 years ago

I've been talking to a lot of people about this, both Gophers and not and of course opinions are all over the place.

I think I've come to the conclusion that semver+ranges are important socially more so than anything else. ATM a lot of packages don't release versions and/or change things up drastically on master at times. So basically anything that forces people to think more about releases is ++. With that said, my opinion atm, is that ranges should be limited to non main packages/libraries.

mhoglan commented 8 years ago

Not sure if this conversation has moved on elsewhere, but I enjoyed reading through it as it is at the heart of the exact problems I have been struggling to deal with. Feel free to point me elsewhere if it has moved on in the last month.

@kardianos I would disagree with that working_with_forks is not related to this issue.

This is precisely the problem I keep having. Our product is using a 3rd party dependency (doesn't even matter really, happens with internal ones too), there is a bug or hotfix in that dependency affecting our product, that has to be fixed immediately and release a new version of our product. One of the typical ways you do this is to fork the dependency, fix the bug, build the product using the forked dependency and release. You then push the change upstream and close the loop later of having your application switch back to the mainline after it is merged.

I know there are multiple ways to solve this, but the easiest way would be to update a spec file that says use the following URL (fork) for import X;

I do not want to have to go rewrite (mutate...) import paths,
I do not want to have to copy code into some subdirectory
I do not want to have to play games with routing hostnames or other similar tricks at the environment level (think what people do with SSH configs to work with multiple GitHub accounts and different SSH keys...)
I do not want to have to script around it with the build process by having it clone the dependency into the path representing the upstream repository and then manage the remotes of the workspace and checkout the fork changes

I want to be able to make the changes on the forked dependency, make the fork available. Then update the application using the dependency. Ideally, all I should need to do is update a spec file that says, use version blah of this dependency. Since golang ties source, import paths, and other things related to projects so tightly together, it hinders these pivot points that almost every other language provides.

This is most evident when it comes into 'what is a version' of a dependency. Because golang ties the import path to the repo home (URL) of the dependency it implies that the version the of a dependency is only within the scope of that repo URL. I believe that to be not ideal.

A version of a dependency should be an 'instance' of that dependency, and an 'instance' of that dependency should be able to originate from multiple places, and thus that origin should be part of the scoping of the version. In golang, we are saying that origin should be URL addressable so it can be retrieved as an import. That would allow using forks.

mhoglan commented 8 years ago

btw I do realize that the spec formats proposed in govendor and glide both address this origin aliasing capability. Was bringing the point above up more out of that I believe it to be a primary use case for using a manifest file for specifying dependencies.

sdboyer commented 8 years ago

Finally finished the article I kept mentioning.

kardianos commented 8 years ago

@sdboyer I finished reading the article you wrote. I'm having a hard time getting past the "LOLZ CATZ" tone in it. There are many assertions of fact. For instance, I believe Dave's proposal was not accepted not because people don't want to encourage semver, but because it wasn't actionable by the any mainline go tool. I commend Dave for the proposal, but presenting Dave as the valiant hero who was shot down without good cause doesn't do anyone any good.

I think most of the technical points present in the article have already been presented here. Though from the writing style it is difficult for me to unravel when you are presenting a point of view, an assertion of fact, or a proposal for action; I may have not accurately understood everything you intended to convey.

A few responses:

A tool should and can work with any size of repo, monorepo or microrepos.
Using a dvcs to download source code doesn't limit the ability to work with individual packages.
Who uses a package manager is greatly determined by the language itself. For instance, in users of programs written in go shouldn't ever touch a package manager, they should touch end binaries. Developers of a given project should think about package managers, but only when updating dependencies. This is much different than PHP, Python , or Ruby.
In go, the build system will never know anything about the package manager, as it is the package manager's responsibility to put packages in the correct location for the build system, just as the compiler knows nothing about the build system.
I'm not a fan of JSON, but it is in the std lib where TOML is not (nor has it reached 1.0 yet). And YAML is sooo much more than a static configuration file, the spec is huge and extremely hard to implement. If you want to have a chance at someday integrating with the go tool, I would recommend against using YAML.

Some of your points don't seem to be founded in actual issues: you have paragraph emotionally targeting people who don't think we need reproducible builds. In the Go ecosystem I don't see that attitude to begin with, so even aside from your tone, there isn't anything to be argued there: we all want reproducible builds at some level depending on our exact needs.

You do offer a good summary of different issues present in specifying version ranges and a good point in that the developer can treat them as a suggestion and override them.

Thank you for your work on glide. I would encourage you to continue exploring what benefits you can get from doing static analysis on a project's dependencies that can augment or assist a manually created list of declared dependencies.

I don't see this issue going forward and will probably close it soon.

In govendor this conversation has pushed me to plan to support version ranges despite the pain I've seen them bring. I already plan to support directly fetching remotes and that is closer than it was before.

sdboyer commented 8 years ago

I finished reading the article you wrote. I'm having a hard time getting past the "LOLZ CATZ" tone in it.

There are a variety of strategies out there for getting people to read almost 13000 words. You get to make your stylistic choices, I get to make mine. The substantive points remain.

For instance, I believe Dave's proposal was not accepted not because people don't want to encourage semver, but because it wasn't actionable by the any mainline go tool.

I think that's an inference you made, not something I said. I simply said that it failed; I didn't say why.

I commend Dave for the proposal, but presenting Dave as the valiant hero who was shot down without good cause doesn't do anyone any good.

I've amended the wording there to be explicit that it failed because it lacked concrete outcomes, but again, I don't think I actually said that. What I DID say was that it probably wasn't incorrect that it failed.

The valiant-ness refers to the willingness to jump into what was sure to be a fractious discussion. I'd ascribe the same to you for this thread, even though I don't agree with your approach.

A tool should and can work with any size of repo, monorepo or microrepos.

And I said as much. In fact, I was quite careful about saying it. What I said was that monorepos were harmful for sharing - not that they should be neglected by a tool.

Using a dvcs to download source code doesn't limit the ability to work with individual packages.

Not much to say here except that I don't think you really understood the constraints presented in the article.

Who uses a package manager is greatly determined by the language itself. For instance, in users of programs written in go shouldn't ever touch a package manager, they should touch end binaries. Developers of a given project should think about package managers, but only when updating dependencies. This is much different than PHP, Python , or Ruby.

The differences are not so big, as...well, the entire article more or less lays out. But directly to your point: Cargo/Rust.

But again, now for the third time, this isn't inconsistent with what I wrote. Right from the outset, I indicated that go get, being an LPM, is a tool at least in part for end users. The issue is having an LPM that's not underpinned by a PDM - the developer tool.

In go, the build system will never know anything about the package manager, as it is the package manager's responsibility to put packages in the correct location for the build system, just as the compiler knows nothing about the build system.

Again, now for the fourth time...this is basically the text on one of the captions.

I'm not a fan of JSON, but it is in the std lib where TOML is not (nor has it reached 1.0 yet). And YAML is sooo much more than a static configuration file, the spec is huge and extremely hard to implement. If you want to have a chance at someday integrating with the go tool, I would recommend against using YAML.

Yep. That's why I didn't touch this in the Go section, but only in the general section. @bradfitz outlined this preference a year ago. It doesn't change my stance on what the right general decision is, of course, but it's a relatively minor issue that would have distracted from main the point.

Ironically, using a non stdlib library for tooling is the kind of thing having a proper PDM would make easier.

Some of your points don't seem to be founded in actual issues: you have paragraph emotionally targeting people who don't think we need reproducible builds.

I do indeed. In part for levity, and in part because, as I was explicit about in paragraph three, the article is targeted at more than just Go. So yes, that is an actual issue - just not for Go.

In the Go ecosystem I don't see that attitude to begin with, so even aside from your tone, there isn't anything to be argued there: we all want reproducible builds at some level depending on our exact needs.

Nor do I see that attitude. ...and, also, I said as much in the article:

While there’s some appreciation of the need for harm reduction, too much focus has been on reducing harm through reproducible builds, and not enough on mitigating the risks and uncertainties developers grapple with in day-to-day work.

The value of including it all, even the stuff that doesn't immediately narrowly apply to your particular language of concern, is that it can help expand your perspective on what the overall problem looks like. Which was the high-level goal of the article.

You do offer a good summary of different issues present in specifying version ranges and a good point in that the developer can treat them as a suggestion and override them.

Thanks. I'm glad you found that useful.

I don't see this issue going forward and will probably close it soon.

That's a shame; per the article, my sense is that we could indeed make incremental progress by defining a proper lock file. Perhaps it would be best to start a clean issue for that, though.