golang / go

The Go programming language

https://go.dev

BSD 3-Clause "New" or "Revised" License

124.38k stars 17.7k forks source link

proposal: Vendor specification and experimental repository fetch code #13517

Closed kardianos closed 8 years ago

kardianos commented 8 years ago

Proposal: Vendor specification and experimental repository fetch code

Author(s): Daniel Theophanes

Last updated: 2015-12-06

Abstract

Establish a specification file format that lists dependency revisions and a package in the golang.org/x/exp repository that discovers, reads, and downloads packages at a given revision. Tools may continue to use other formats to generate this file.

Background

Many developers wish to specify revisions of vendor dependencies without copying them into the repository. For a case study I will bring up two:

A) https://github.com/cockroachdb/cockroach

B) https://github.com/gluster/glusterd2

(A) uses github.com/robfig/glock which specifies revisions for each remote repository in file in the project root called "GLOCKFILE". A partial list of the file is:

cmd golang.org/x/tools/cmd/stress
cmd golang.org/x/tools/cmd/stringer
github.com/agtorre/gocolorize f42b554bf7f006936130c9bb4f971afd2d87f671
github.com/biogo/store 3b4c041f52c224ee4a44f5c8b150d003a40643a0
github.com/cockroachdb/c-rocksdb bf15ead80bdc205a19b3d33415b23c156a3cf371
github.com/cockroachdb/c-snappy 5c6d0932e0adaffce4bfca7bdf2ac37f79952ccf
github.com/cockroachdb/yacc 443154b1852a8702b07d675da6cd97cd9177a316
github.com/coreos/etcd a423a55b142c2b9a82811604204cddbccd0a9cf9

(B) uses github.com/Masterminds/glide which specifies revisions for each remote repository in a file in the project root called "glide.yaml". This file contains:

parent: null
package: github.com/gluster/glusterd2
import:
- package: github.com/gorilla/context
  version: 1c83b3eabd45b6d76072b66b746c20815fb2872d
- package: gopkg.in/tylerb/graceful.v1
  version: 48afeb21e2fcbcff0f30bd5ad6b97747b0fae38e
- package: github.com/pborman/uuid
  version: cccd189d45f7ac3368a0d127efb7f4d08ae0b655
- package: github.com/gorilla/mux
  version: ad4d7a5882b961e07e2626045eb995c022ac6664
- package: golang.org/x/net
  version: b4e17d61b15679caf2335da776c614169a1b4643
- package: github.com/docker/libkv
  version: 93099f38de7421e6979983652730a81e2bafd578
- package: github.com/codegangsta/negroni
  version: c7477ad8e330bef55bf1ebe300cf8aa67c492d1b
- package: golang.org/x/sys
  subpackages:
  - /unix
- package: github.com/meatballhat/negroni-logrus
  version: dd89490b0057cca7fe3fa3885f82935dfd430c2e
- package: github.com/Sirupsen/logrus
  version: v0.8.7
- package: github.com/hashicorp/consul
  version: v0.5.2

I would like to point out a few features these tools provide:

Specify commands to fetch.
Specify repositories at a given revision.
Specify repositories at a given version.
Specify a sub-tree of packages in a given repository.

Right now each vendor tool specifies these same properties in different formats. A common tool cannot be built that reads a single file and downloads the needed dependencies. This isn't a huge burden on a dedicated developer, but for a user passing by who just wants to build the source quickly, it is an impediment.

Proposal

I propose specifying a single file format that will describe packages sourced outside the project repository. I also propose adding a packge to the golang.org/x/exp repository that discovers, reads, and optionally downloads third party packages.

Furthermore I propose using the specification found at https://github.com/kardianos/vendor-spec with one addition as the basis for this specification. The addition is:

Package []struct {
    ...

    // Tree indicates that the specified folder, along with all sub-folders
    // are required.
    Tree bool `json:"tree"`

    ...
}

Both the specification and the proposed package will be considered experimental and subject to change or retraction until at least go1.7. This process will be done with an eye to possibly adding this feature to go get.

Rationale

The vendor file format needs to be able to be read and written with standard go packages. This adds to the possibly that go get could fetch packages automatically.

Vendor tools exist today that download packages from a specification. They are just incompatible with each other despite using the same information to download the dependencies. If we can agree on a single format for tools to write to, even if it isn't the primary format for that tool, all tools and possibly go get can download dependencies.

Existing vendor tools and their formats don't always handle corner cases or different approaches. For example current tool file formats can't handle the case of vendoring a patched version of a standard library package (this would have been useful for crypto/tls forks for detecting the heartbleed attack and for accessing MS Azure).

I am proposing a file format that "govendor" uses. I'm not trying to put my own tool as central. Infact, "govendor" was built to validate the "vendor-spec" proposal. The "vendor-spec" has received significant external contributions and as such "govendor" has changed to match the spec (and will continue to do so).

Compatibility

This will be standardization of existing practices. There is no go1 compatibility issues. Existing tools can treat the specification as a write only file.

Implementation

A file format to describe vendor packages should be accepted when this proposal is accepted. Should this proposal be accepted a new package should be added to the "golang.org/x/exp" repository to support reading the vendor file and downloading packages. The author of this proposal offers to create or assist in creating this package. This would be created within 2 months of the proposal being accepted.

Risks

It would be ideal if other vendor tool package authors could agree to at least write to a standard file format informally and collaboratively. Indeed the largest risk is if vendor tools fail to write the common file format. However I think unless there is a tangible benefit (such as go get support) there will continue to not be a reason to collaborate on a standard.

Open issues

The proposed standard file format uses JSON, which might be better then XML, but harder to write by then something like TOML. Tools that want the vendor file to be hand created will be forced to generate this file from a different file.

The file format specifies packages, not repositories. Repositories can be specified by using the root path to the repository and specifying "tree": true, but it isn't the default for the format. Some people may take issue with that as they are used to or desire tools that only work at the repository level. This could be a point of division. From experience I absolutely love vendoring at the package level (this is what github.com/kardianos/govendor does by default).

kardianos commented 8 years ago

Responses welcome. @robfig, @freeformz, @mattfarina

mattfarina commented 8 years ago

@kardianos thanks for including me here.

/cc @davecheney

mattfarina commented 8 years ago

For anyone reading this I want to provide some background material. While I develop on Glide this is less about my opinions (in this comment) and more I'd like to make sure to add contextually relevant information.

We currently have more than 5 specification files floating around the Go ecosystem. They exist today.
If we're going to use specification files to solve problems it's useful to know the kinds of use cases they they help solve. Some of us crafted a number of use cases to try and document these.
Most of the widely used and popular languages have package specifications. This isn't a new idea. Even Swift, which was just open sourced by Apple, as a spec of sorts. I know some Go developers aren't familiar with these so I wrote up an overview of a number of these. Of those looked at are both compiled and interpreted languages along with those that use dynamic and static typing.

I do ask that anyone who jumps into the discussion on this with opinions take a little time to come up to speed on this space. Outside of Go the specs and tooling are a fairly mature topic.

This is also one of those topics with an impact on developer experience so it's worth looking at that as well.

While I have my own opinions, which I will detail soon, if anyone has questions or pointers aside from my opinions I'm happy to inform. I'd like anyone who wants to discuss the topic to be well informed on the space.

sparrc commented 8 years ago

Yes, this would be fantastic, currently there are many different file formats, to list a few:

All the examples are remarkably similar. To me it seems like an import path and revision hash/tag are all that are necessary, although others probably would like something more complicated. This is why I opened https://github.com/golang/go/issues/13483, because for me getting a dependency at a specified rev using standard Go tools is all I want.

The capability to easily create the simple Godeps (gpm) file is almost in the go/build and vcs packages already. What we still need are:

Ability to determine currently checked-out rev of each import (similar to https://godoc.org/golang.org/x/tools/go/vcs#Cmd.Tags)
Ability to determine built-in vs 3rd-party imports (already can get all imports: https://golang.org/pkg/go/build/#Package)

freeformz commented 8 years ago

I am +1 wrt this. I've spent some time basically re-implementing parts of go list and go get (not to mention fighting go/build) for godep.

rsc commented 8 years ago

It would be great for tools to use the same vendor spec. I thought that was the goal of the vendor-spec work.

I am concerned that tools are not already using it. We've said that's what we want tools to use, it's there for using, and yet they are inventing their own. Why? Perhaps vendor-spec is not good enough?

rsc commented 8 years ago

@kardianos, there's not enough detail here. You wrote "I propose specifying a single file format that will describe packages sourced outside the project repository." That's the vendor-spec, right? Yes, we think there should be just one, but we really want the tool authors to converge semi-organically rather than mandate something. We've done a bad job here at mandating in the past (basically what I wrote in my last comment).

But then you wrote "I also propose adding a packge to the golang.org/x/exp repository that discovers, reads, and optionally downloads third party packages." I don't know what this means. More detail needed.

mattfarina commented 8 years ago

First, I'm glad we're entertaining this conversations and thanks to @kardianos for putting in a bunch of work on this.

I have a number of concerns over the data structure outlined here. I believe it is insufficient for our needs. Let me explain.

In some venues this has been called a lockfile. But, the Revision property can be multiple things including a tag (e.g., v1.3.5) and the description says it can be used to fetch the same or similar version. A lock file needs to be the exact same version down to the commit. This is needed to reproducible builds.
There are cases where you have trees of dependencies. Those trees could list the same dependency more than once and have slightly different compatibility requirements. Any automation tooling needs to resolve the latest version that meets all the requirements. Handling this is usually done by specifying acceptable version ranges (e.g., >= 1.2.3, < 2.0.0). There needs to be a field to specify these ranges for resolution in addition to a locked revision field. In most modern systems these two types of information are captured in two different files (a config and a lock file).
There are times where you don't know the VCS type. For example, the url https://example.com/foo/bar could be the path to a package but there isn't enough detail to capture which VCS is behind it. Is it Git, Svn, or something else? There really should be an opt-in property to specify the VCS since Go supports 4 out of the box. This is needed as part of the setup to reproducibly setup the environment in different systems.
To produce a reproducible build you really need to capture the complete dependency tree and the pinned versions (commit ids) for everything. At the top level of an application you only want the packages for your application. I'm not sure how to deal with both using this spec.

These are just a few of my concerns. I really want to see something that allows for:

Provides a user friendly way to capture dependency information.
A nested dependency tree to be handled well with automation and variations in needed versions in that tree.
No requirement on packages being in the GOPATH at any point (other than the parent application being worked on). This is often requested.
Deals with renaming, private repos, multiple VCS, and lots of variation.

To illustrate the needs I've collected a number of use cases that need to be satisfied by any spec. I understand that a number of people come from C/C++ here. Other languages, where many Go developers are coming from, have already solved many of these problems. I wrote up how they handle a number of common cases. Building something with a similar experience or one they can understand with that background would be useful.

Note, in full disclosure I worked on a competing spec attempting to solve these use cases. This data structure is what Glide is roughly moving to and is influenced by our work there.

kardianos commented 8 years ago

@rsc, yes, this is effectively the goal of vendor-spec. As you noted, I haven't seen convergence on a single spec for vendor packages. Perhaps another way to phrase this proposal is "give tentative blessing to a format from the go team and and ask for feedback from tool authors". I'm completely aware this is putting the cart before the horse.

I've asked for feedback in the past on why tool authors couldn't adopt it. I've heard:

silence
the spec is per package rather than per repo
it wouldn't work

To address the second point, I propose adding the (probably poorly named) "tree" parameter that says, everything under this point is also included. It could be the vendor-spec isn't good enough; I just don't know in which way's it is deficient.

At this point I'm not sure if the existing variety is due to a lack of consensus or just lack of caring to change existing and working tools. Thus if it was proposed that command like "go get" read and used the vendor-spec file (not 100% a good idea), then I think many more people would care about having and using a common format. As it is, it is a nuisance when exploring or auditing many different go packages, but not a complete show stopper; they are all machine readable and they all contain the same information and many large project have Makefiles that hide which vendor tool they use to some degree.

RE /x/exp/ package: You're correct, more detail would be needed. Mainly here to say, this proposal would be of two parts, a spec and a package that handles the spec. What that API looks like would need to be defined. I would love to add this if the fate of this proposal gets to that point.

I suppose what I could try to find out next is why vendor tool authors not using this:

Is it not workable for that tool?
Do they already have a format and don't see a reason to change it?
Do they just want to do their own thing?

@freeformz I think is open to using something like this @mattfarina Has said it won't work, and has promised more detailed info.

I'll try to ask around.

kardianos commented 8 years ago

But, the Revision property can be multiple things including a tag (e.g., v1.3.5) and the description says it can be used to fetch the same or similar version. A lock file needs to be the exact same version down to the commit. This is needed to reproducible builds.

Agree. For distributed vcs the revision field is the hash. I could specify that perhaps more clearly. I think we agree here.

There are cases where you have trees of dependencies. Those trees could list the same dependency more than once and have slightly different compatibility requirements. Any automation tooling needs to resolve the latest version that meets all the requirements.

The vendor-spec defines the content as everything that is or should be in a single level "vendor" folder. I think that should be sufficient for a lock file, correct?

Handling this is usually done by specifying acceptable version ranges (e.g., >= 1.2.3, < 2.0.0). There needs to be a field to specify these ranges for resolution in addition to a locked revision field. In most modern systems these two types of information are captured in two different files (a config and a lock file).

I'm only interested in specifying what we know as the lock file. I think the "version >= 1.2.3" would be fine in a different config file.

There are times where you don't know the VCS type. For example, the url https://example.com/foo/bar could be the path to a package but there isn't enough detail to capture which VCS is behind it. Is it Git, Svn, or something else? There really should be an opt-in property to specify the VCS since Go supports 4 out of the box. This is needed as part of the setup to reproducibly setup the environment in different systems.

Go get handles this with probing. I'm also fine adding a well known optional field that specifies the vcs type ("git", "ssh+git", "hg"). I don't see this as a show stopper.

To produce a reproducible build you really need to capture the complete dependency tree and the pinned versions (commit ids) for everything. At the top level of an application you only want the packages for your application. I'm not sure how to deal with both using this spec.

I'm not sure I understand your concern. If you have or want a package in the vendor folder, have the tool write down the package path and revision in the vendor-spec file and it will be captured. Could you help me see what I might be missing? To be concrete, in govendor, does it not have enough information for reproducible builds?

Provides a user friendly way to capture dependency information.

Sure, I would choose to use a CLI command in govendor, glide and glock could use a config file. We all write down what we fetch in a single lock type file.

No requirement on packages being in the GOPATH at any point (other than the parent application being worked on). This is often requested.

This is tool specific, not spec specific. I'm working on adding this to govendor and there are no issues with adding it.

Deals with renaming, private repos, multiple VCS, and lots of variation.

I'm not sure what you mean by renaming. Origin? Multiple VCS can be handled just fine, that's a tool issue. Private repos is worth talking about, but it might be handled with a stored ssh key and saying, "use ssh"? But again, I don't see a conflict with the given spec.

To make sure we are talking about the same thing, I will copy and paste in the glide.lock file for glide and the vendor.json file for govendor:

glide glide.lock:

hash: 1fdfb16656a1b4a1664afdc9e2a5fc8040165bc5b6e85812df2affceacb7fbc8
updated: 2015-12-21T09:29:33.170992254-05:00
imports:
- name: github.com/codegangsta/cli
  version: b5232bb2934f606f9f27a1305f1eea224e8e8b88
- name: github.com/Masterminds/cookoo
  version: 78aa11ce75e257c51be7ea945edb84cf19c4a6de
  subpackages:
  - .
- name: github.com/Masterminds/semver
  version: 6333b7bd29aad1d79898ff568fd90a8aa533ae82
- name: github.com/Masterminds/vcs
  version: eaee272c8fa4514e1572e182faecff5be20e792a
- name: gopkg.in/yaml.v2
  version: f7716cbe52baa25d2e9b0d0da546fcf909fc16b4
devImports: []

govendor vendor.json

{
    "comment": "",
    "ignore": "",
    "package": [
        {
            "path": "github.com/dchest/safefile",
            "revision": "33aeb10e4bb6edb4016c53b6140fc9a603346e04",
            "revisionTime": "2015-07-03T18:05:53+02:00"
        }
    ]
}

We are really talking about lock files, not a package specification. In other words, I don't think your pkg spec and the vendor-spec are competing, they are doing completely different things. Your glide lock file is pretty much exactly what the vendor-spec is trying to do as far as I can tell.

There are corner cases to discuss, but every tool that I've seen has something like a lock file that contains an import path and a revision (a hash if using a dvcs). Perhaps we can't agree on all the other meta data, but maybe we can at least write those two bit of info, and maybe a few others into the same machine format.

mattfarina commented 8 years ago

@kardianos thank you for clearing some things up.

I think it would be useful to clarify that you're attempting to create a lock file rather than create a package specification. The current title says, "Vendor specification".

With that in mind...

What use cases does the lock file file in this current state?
Why is there a RevisionTime on each package? What use case does that help to solve?
What use case(s) do the comment at the top and package levels support?
What are your thoughts on extensions to the spec? For example, in Glide I have more data than this in those files that get into filtering, etc.

Note, I'm asking on 2 and 3 because they do not fit into the use cases I've previously worked out. Trying to understand the details.

My issues at a high level, and I'm sorry I have to be so brief as I have to go for now, are...

This doesn't provide a solution to my use cases. It's insufficient to automate package management.
There is still a need for a package specification to capture relevant information.

In the goal to solve what's needed for package management for the majority of developers or is it to do one small slice of the puzzle that others still need to build on?

kardianos commented 8 years ago

@mattfarina

End Goal: A tool provided with the vendor spec file should be able to fetch all packages at a given revision from their original repository (if available).

This would enable standard user tooling for fetching remote packages at a given revision. This also enables machine analysis of dependencies across the board, such as looking for vulnerable revisions (dvcs hashes) and mapping dependency usages.

...

Revision Time: I've worked on projects that are 15+ years old. Code bases sometimes lose touch with original source and sometimes I just want to know what year or decade it is from.

Comment: JSON sucks, but is simple and well supported. If you want to write down a comment, with a tool or by hand, put that human note there. By itself JSON doesn't support // comments, so turn in to fields.

As per the spec, all unrecognized fields are required to be persisted by other tools modifying a file. In other words, extensions are expected.

...

While I don't want to quibble with words, it is called the "vendor specification" because it is specifying vendor package revisions. It isn't called a package specification because it knows nothing about the package it is used in.

You are correct. The vendor-spec file lives in the vendor folder and talks about the vendor packages. If you want a package meta-data file it should live in a package directory and tell you about the package folder it is in.

The goal is to write down what revisions is in the vendor folder. What you call a lock file. If the source of the package isn't from the the "go get" location, then it provides the origin to write down which location it does come from. (Useful if a package has a modified package from another location, like github.com/microsoft/azure/go/vendor/crypto/tls). By default it (currently) works at the package level unless you specify the proposed "tree": true parameter.

freeformz commented 8 years ago

FWIW: I have not added support for the vendor-spec to godep because I've had to work on other things instead and it hasn't been a priority (i.e. users aren't asking for it). I do want to support the vendor spec but instead have been working on replacing our use of go get and go list because they don't really work for all of our use cases as is.

mattfarina commented 8 years ago

@kardianos another problem that would be useful to address would be to move the vendor-spec outside the vendor/ directory. That way the vendor/ directory can be put in an ignore file (e.g., .gitignore) for those not going to store dependencies in their VCS.

@freeformz it sounds like you'd like to take the concept of Godep and move it into the Go toolchain. While Godep has been around for awhile and been able to fill in a number of use cases, there are numerous use cases people have been asking for that cannot be easily implemented in its flow. I would prefer to see something that enables those as well.

To map dependencies well is a problem in this setup. For example, a lock file should really only be at the top level and not throughout the tree. Dependencies shouldn't be in multiple vendor/ directories in the tree unless you really know what you're doing. Otherwise you can end up with binary bloat and errors. So, knowing the version compatibility and mapping that in a tree is missing.

If a vendor-spec this is present throughout a tree there can be cases with many instances of a common dependency at different versions all mapped by commit id. This doesn't allow automated tooling to work out the best version to use or map a tree. This can be a problem in practice. For example, if you look at kubernetes the same dependency can be referenced many times in packages and sub-packages all to different commit ids. Resolving versions becomes difficult.

In tool chains for other languages a lock file isn't used to figure out or map the tree. Instead this is a config file knowing more (e.g., a semantic version range).

mattfarina commented 8 years ago

How would tools used by go:generate be specified in this setup?

akavel commented 8 years ago

Responding to some of the concerns/issues raised above:

@kardianos:

As to feedback on the vendor-spec: I've actually used it in a tool (https://github.com/zpas-lab/vendo, not yet advertised/announced because OSSed just recently and not polished/readmed yet, eh), and it worked great for me! Thus, I didn't have a need to complain, so I just used it without seeing any need to provide feedback... So, sorry to be late, but a _BIG THANKS_ for the spec and all the work that's gone to build it, I really found it very well thought out...
- To elaborate a bit: I did add a few custom fields, but from what I understand that's totally in line with the spec, it describes only the mandatory fields. Specifically, I added:
```
{
    ...
    "platforms": ["linux_amd64", "windows_amd64", ... ],
    "package": [
        {
            "repositoryRoot": "_vendor/src/github.com/spf13/cobra",
            ...
        }
    ]
}
```
where the "repositoryRoot" is described in more detail in the sources, but in short words it generally helps to find the .git/.hg/.bzr/... dir (i.e. the "tree root"?). See also the example resulting vendor.json in my repo.
- With that said, I now notice I've apparently used an older version of the spec (built the tool some time ago), e.g. where the vendor.json file was located in the repo's root dir, not in vendor/ (or did I misunderstand something?). Seems I have to review what's changed. To tell the truth, personally I'd prefer vendor.json to stay in the root dir (one of the reasons being that for now I use "_vendor/", not "vendor/"; but not only).

_edit:_ uh, oh; I've glanced over the changes since Jun 12, and my initial impression is that I think I wouldn't be able to build my tool with the spec as it looks today, unfortunately :(

@mattfarina:

There are times where you don't know the VCS type. For example, the url https://example.com/foo/bar could be the path to a package but there isn't enough detail to capture which VCS is behind it. [...]

Personally, as of now I'm not really convinced this is actually needed/useful. What if the repo owner changes the VCS used? And even if not, I'm not quite sure why one can't autodetect it the same way as the go tool does this. But even if I'm wrong in this regard, the vendor-spec specifially allows to add any custom fields to the JSON file, so I don't see why a tool couldn't just go on and do that?

To produce a reproducible build you really need to capture the complete dependency tree and the pinned versions (commit ids) for everything. [...]

Uh, that's exactly what I'm doing with https://github.com/zpas-lab/vendo using vendor-spec; thus I believe it is totally easy to do with vendor-spec; did you have some specific trouble with that, could you elaborate?

Why is there a RevisionTime on each package? What use case does that help to solve? [...]

One recent event that I believe is a perfect illustration of how RevisionTime is awesome is the migration of code.google.com projects to github. Given that it often involved migration from hg to git as a side effect, you're effectively losing the information the hash ID gave you (that is, the Revision field becomes useless), but the RevisionTime should stay perfectly relevant. Thus giving a trivial way to find a corresponding commit in the new (github) repo, and also to check what new commits were introduced since last time you checked/pinned.

robfig commented 8 years ago

@kardianos If I could snap my fingers and vendor-spec would be supported by glock I would do it, but glock has been stable / unchanged for a while, I haven't had a need to do it, and it seems like a lot of work.

But also, I think that the manifest format is not all that differs between tools - for example, glock supports commands, it only supports lockfiles for the end user's application (not for intermediate libraries), and it doesn't vendor dependencies. Seems to me that the vendoring zeitgeist ended up at a tool that is nothing like glock, so I didn't see much of a point in trying to keep up.

I'm looking forward to a tool that finally gains widespread adoption though! Seems like "gb" is in the best spot for that?

technosophos commented 8 years ago

I have several issues with the original proposal, @kardianos. Two are architectural, and one is polemical.

1) Why operate at package level instead of repo level? The only reason given is preference:

From experience I absolutely love vendoring at the package level

But not even the experiences are relayed. This seems odd to me for one very clear reason: Versions are not an attribute of packages, they are an attribute of repositories. Therefore, at even the most rudimentary, vendor spec suffers from what is called "level confusion" -- the assigning of attributes to the wrong "level" of abstraction.

This is clearly evidenced by the fact that the proposed file format would allow setting different versions to two co-located packages. Doing so would clearly allow unintended side effects and difficult state resolution.

2) I also object to vague and misleading statements in your original proposal like this:

For example current tool file formats can't handle the
case of vendoring a patched version of a standard library package (this
would have been useful for crypto/tls forks for detecting the heartbleed
attack and for accessing MS Azure).

Given that your proposal does nothing to address this case, and that you are conflating "tool" and "file format", this seems to me to be more FUD than useful commentary. Not to mention that at least one of the tools that you point out in your proposal has handled that situation elegantly since its inception.

3) As I read your proposal, I see no advantages in using your new solution over existing formats like Glide's lock file. You don't seem to give any reasons. You just seem to assert that we need your spec instead of just standardizing on one of the existing ones. In fact, your comparison to the Glide.lock file points out that the Glide.lock file has some important features that your spec is missing, like dev imports and a hash. (Worthwhile note: Glide.lock also has an ignore list. It's just omitted when empty.)

dmitshur commented 8 years ago

There was one comment I wanted to make in this thread, and luckily @technosophos has just made an identical comment just now. So I will quote it and say I agree with it:

Versions are not an attribute of packages, they are an attribute of repositories. Therefore, at even the most rudimentary, vendor spec suffers from what is called "level confusion" -- the assigning of attributes to the wrong "level" of abstraction.

At the very least, I would want to hear good arguments for doing it another way. But assigning versions to repo roots seems like the most natural and effective solution.

kardianos commented 8 years ago

@mattfarina RE Vendor file location: I'm fine either way. I think @rsc wanted to keep all the vendor stuff in the vendor folder, including the vendor file. If you want a gitignore line that ignores the content of the vendor folder but not vendor/vendor.json file, use vendor/*/ in your ".gitignore file.

@mattfarina and @robfig RE commands: I'm open to suggestions here but how govendor uses the vendor-spec to support commands is to just include them as a normal package. The tool itself then discovers it is a program (it finds a package main) and can decide what to do with it. I could however be missing something here. I'm assuming the developer, tool, or script could then run go install vendor/... and it would install the packages and commands so that go:generate would work. Again, let me know what I missed.

@robfig RE existing tools: Yeah, I understand. It might be easier to let projects move off of it if they choose to. I hear gb is a great tool for building go using workspaces. It also has a gb-vendor sub-command that comes with it. In this case I think we are looking vendor tools that complement the go command.

@technosophos and @shurcooL RE recording at the package level: I agree I'm the odd man out on this and as such might lose :). But I will try to explain my rationale. Let me break this down into two parts:

The ability to specify individual packages from a repository.
The ability to specify a revision per package.

I have a package that vendors files from vitess. Now vitess is a large repository and I only want two packages out of the entire thing. For this I would like to specify which two packages I want and leave the rest behind. For this I need (1).

Point (2) is mainly due to this: you have a stable package github.com/u/p/{a,b,c}, perhaps it is a utility repository with several repos, perhaps like golang.org/x/crypto and we wanted to update the bcrypt package leave the ssh package where it is at. With the current design tools can allow for this. This is what govendor allows.

There are times where you want to note an entire repository or sub-tree for either C files, resources, or maybe that's just how your tool works. That is why I am proposing adding the "tree": true field.

I use property (1) all the time and like selecting out package from a repo. I would like to retain (2), but I do understand objections to it. I would be interested in other's opinions on this too (try out govendor, not to use it, but to see how it works in this context).

@technosophos RE origin / std library patches: This is not FUD. This is an example from the vendor-spec itself:

        {
            "origin": "github.com/MSOpenTech/azure-sdk-for-go/vendor/crypto/tls",
            "path": "crypto/tls",
            "revision": "80a4e93853ca8af3e273ac9aa92b1708a0d75f3a",
            "revisionTime": "2015-04-07T09:07:15-07:00",
            "comment": "located on disk at $GOPATH/src/github.com/kardianos/mypkg/vendor/crypto/tls"
        },

This allows representing the package import path is crypto/tls, but get it from the azure repository (in this case it would be a patched version to allow go to connect to azure). govendor can handle this situation today because the vendor-spec allows it. It shouldn't be common, but it should be supported. This is part of the file format as the specification needs both a path and origin field and constant semantics assigned to each.

@technosophos RE existing formats: Early this year the core developers stipulated that the manifest file should be able to be reasonably read with the go std library. Either we create an ad-hoc format, or we use something kinda gross, but well supported like JSON or XML. The fields that the glide.lock file has by and large seem fine. I'm not sure if relevant, but the vendor-spec didn't come from govendor, govendor came from the vendor-spec. So the glide.lock file looks fine, but not yaml. That is a huge format to support and isn't in the std library.

@akavel RE changes to spec: I'm glad it is useful. Yes, before it noted down the relative path from the vendor file, so you could place it many places and have it resolve. It has been locked down some to just the vendor folder. The current method is slightly simpler but more restrictive. I'd love to hear other's thoughts on the matter. Relevant issue: https://github.com/kardianos/vendor-spec/issues/39

@akavel RE additions: That was added early on as a suggestion. So yes, that is encouraged.

mattfarina commented 8 years ago

@kardianos a few things and i'll break them into bullets for easier reference:

Part of the reason we're having some of these discussions is a lack of clarity on what is being solved. What are the requirements or use cases? An specs need to be crafted to solve those requirements or use cases. I've listed the use cases I believe need to be solved which this format does not do. Can you please list your requirements or use cases so we can evaluate them and look at how a spec works against them?
Using vendor/*/ in an ignore file would cause user experience issues for numerous users. In the package management setups in languages Go users are coming from the lock files are not in the same directory packages are stored in. They are used to ignoring vendor/ or something similar. This minor difference will be a cause of headaches.
Using go install ./... for tools with go:generate doesn't work well. If two different applications within the GOPATH require two different versions of the same tool the singular GOPATH/bin isn't capable of handling that. This is a problem beyond the scope of the vendor-spec but relates here as well.
Can you explain the technical environment and reason for package level versions instead of repo level versions? What requirements are they needed for, how do they fulfill them, and how to handle the dangers of multiple packages from the same repo with differing versions (since that breaks known testable stability within the imported repo). Looking for technical details to backup your opinion and understand where they are coming from.

@kardianos @rsc Something occured to me while reviewing this material. As I work on Glide, watch requirements come in there, and discuss package management with those inside and outside the Go community I realized that this "spec" isn't born out of experience in the community. We're still learning what people need and are adapting. The vendor-spec has not taken off organically. In my opinion it's premature to put this or any other spec in the go toolchain.

I would suggest waiting for the GO15VENDOREXPERIMENT to be on by default, collect the requirements needed by its users, and make sure any spec meets those.

kardianos commented 8 years ago

@mattfarina RE use cases:

Know what revision was copied into the vendor folder.
Enable standard user tooling for fetching remote packages at a given revision.
Enables machine analysis of dependencies across the board, such as looking for vulnerable revisions (dvcs hashes) and mapping dependency usages.

@mattfarina RE go:generate tools: I'm all ears. What would you propose instead? Are you wanting a bin folder in the vendor folder or something similar?

@mattfarina Re experience: Most of us have many years of experience vendoring of some type in go. We are transitioning to the vendor folder now. If you want a varied experience and use cases, let's talk to people now, not later. @goinggo Bill, thoughts on this? You interact with many more people than I do.

mattfarina commented 8 years ago

@kardianos thanks for sharing your use cases. That helps me better understand where you're coming from. I have some comments on there.

For (2) there are problems when you fetch versions at a package level rather than a repo level. For example, if you fetch 2 packages in the repo at two different versions and they rely on a 3rd common package from the repo. Which version do you use there? And, this concept breaks atomic tested commits for the imported repo.
What tooling does machine analysis based on commit ids for vulnerabilities? Tools I'm aware of, such as David for node.js do analysis based on versions and version ranges (e.g., ^1.2.3). It's easier to know the latest non-vulnerable version number and evaluate if a version is after that than to know the last non-vulnerable commit id (hash) and if the current one is after that. With version numbers you don't need the complete commit history.
What kind of dependency usage (3) are you looking to map? You can already map the dependency graph today on godoc.org. For example, see the kubernetes apiserver package. With a lock file you can't really map versions required within the tree because you've stepped past that to the single commit in use for the tree. So, what kind of mapping are you looking to do that requires this?

I don't have a proposal for go:generate. That's still something to be worked out. It's one of the things we've not worked out making it premature to have a solution.

Package management in most programming language ecosystems have settled on one tool. Rust has Cargo. PHP has Composer (formerly Pear was used). Node.js has npm. You get the idea. The Go ecosystem has become quite divided. The tooling supplied by Godep (a longtime solution) was insufficient for many. Now the ecosystem is fractured and a wide array of solutions are being worked on in order to meet all the use cases people have. If the vendoring setup you talked about worked for the masses GB and Glide would not be gaining the following they are and there would have been no call for them in the first place. Those are just two of the many tools being created.

Trying to push this solution in without adapting to meet the needed use cases for many could cause further community issues. Package management has become a hot topic. Many of the elements of the go toolchain nailed it so there wouldn't be a need to debate things. Package management is something we need to get right for the majority of developers or not bring into the go tool.

@kardianos Could you possibly expand on you this spec would be needed for my use cases? Then I could see how it would fit into the broader package management situation.

freeformz commented 8 years ago

@mattfarina FWIW: I have no intention of wanting to take the concepts of godep and move them into the Go toolchain. I was stating that I can no longer rely on the go tool chain and am +1 a library so that I don't have to maintain my own internal versions of the tools as libraries.

mattfarina commented 8 years ago

@freeformz I would be curious to hear your take on the use cases I'd previously worked with others on.

I also want to make it clear that I don't hold anything against Godep. I just know that it does not cover all the use cases developers have and want to see the go tool handler more of them than Godep covers today.

mattfarina commented 8 years ago

I'd like to note that there is a fundamental difference between the glide.lock file and the vendor-spec. That's versioning being at the repo or package level.

ardan-bkennedy commented 8 years ago

@mattfarina I am not 100% sure what you mean by package level versioning but vendoring at a package level is a must because more and more of us are creating kit type repos where not all the packages will need/want to be vendored.

I have been giving @kardianos feedback on govendor and I think the tool is very close to having most of the bases covered for project management. I am at the point thanks to the vendor flag and govendor where I don't even think about dependency management anymore.

This is what I am doing today and it has kept things simple and manageable.

I believe projects need to own the dependencies they use and vendor them into the projects source tree.
A project uses a single repo and MUST contain only a single vendor folder at the root of the source tree.
Each project can only contain a single version of any given dependency in the vendor folder.
When adding a dependency (package) inside the vendor folder several things need to be taken into account. a. Copy the package code from the GOPATH. If the code is not there, complain. b. If that package has dependencies, look inside the packages project for a root vendor folder. Pull dependencies from there first. Else look in GOPATH.

4.b allows repos to have a root vendor folder to secure dependencies. These dependencies can be respected.

Conflicts This is where tooling becomes very important. The user has to make choices.

When a version of a package is already vendored an update command will replace it. When this update is a direct and conscious decision let it happen. If the tool is pulling a dependency from a project vendor folder, then versions need to be checked. This is a place where the vendor file is critical.

Fetching I would like to see govendor be able to fetch a dependency and place it directly into the vendor folder without leaving any artifacts in the GOPATH. The same conflict rules apply.

Finally I don't think everything needs to be automated. The tooling just needs to provide information and guidance. Each project and situation is different. Project structure and information is the key. I believe the project model with a single repo and a single vendor folder at the root of the project simplifies things and works.

If you want to use tagged versions to identify conflicts I think that could work. I do like the idea of tagging with semver because I think commit ids are too granular. I may have a dozens changes that don't break the API. That being said, bugs can still be introduced so the commit id will flag potential issues that the user can look into.

Just don't become paralyzed with edge cases, let the tooling provide the information so the user can make an informed decision.

mattfarina commented 8 years ago

@ardan-bkennedy I'm not sure what you mean. Can you point me to an example?

I quite often use the vendor/ directory the same it's typically handled in other languages. It's a place to hold the projects from vendors rather than the place to put them within my VCS. I use Glide to fetch and manage what's in there.

The packages it checks out are in repos and everything checked out from a repo needs to be on the same version. The version being the same is important because different packages in the same repo being on different versions means there are untested combinations of code from the same repo, atomic commits have been broken, and there is no way to cleanly handle a shared dependency of multiple packages in the repo on different versions. This is a bad idea.

sdboyer commented 8 years ago

vendoring at a package level is a must because more and more of us are creating kit type repos where not all the packages will need/want to be vendored. where not all the packages will need/want to be vendored.

@ardan-bkennedy could you please provide an example of a "kit type repo"?

If I'm guessing correctly, then it's a repository containing an assortment of unrelated code, such that it is meaningless to apply any kind of real version number to the code?

I'm not sure that it's fair to say "more and more of us" are doing that - my personal experience (FWIW) is that the only repositories I see like that are older, not recent. Either way, treating a repository as a kitchen sink to which a meaningful versioning scheme cannot be applied is a poor development practice. As @technosophos noted earlier, it constitutes "level confusion" - versions are a property of the repository, not the package. It should not, I think, be the target use case - merely one that is not disallowed.

jbuberel commented 8 years ago

@mattfarina @sdboyer - An example of a kit-type repo is go-kit.

You may opt to only use the ratelimit and circuitbalancer packages, and not the others. If you follow the practice of minimal vendoring (only vendor what you actually use), you may want to vendor this repo at the individual package level, not the project level.

Note: I'm not advocating that tools be capable of managing dependencies at the package level, only providing an example of what it might look like.

ardan-bkennedy commented 8 years ago

@mattfarina @sdboyer. The language is package oriented. A package is the basic unit of compilation, and we are taught to think in terms of components (packages). Our API's are based on a package. We import packages. So why would package and dependency management not be package based?

From my point of view of history

Back in 2013 when I learned how go get worked, it seemed that a repo should represent a single package. This would allow the package to be go gettable and everything that comes with that. But I think we have learned that repo management is a lot of work, so minimizing the number of repos you need to manage is important. This leads to the idea of a repo representing a project.

A project can take on two forms:

1) A set of packages that produce a set of binaries for a product or service. 2) A set of packages for use by others. This is where the kit comes in.

The Gorilla web toolkit uses a set of repos where each package is contained in a repo. But I think if this was being developed today it would follow what Peter Bourgon is doing with go-kit and JP Robinson is doing with gizmo and now what I am doing with my kit.

If you look at my kit, I have added a vendor folder at the root of the project. This was not a good idea prior to the vendor experiment and the govendor tool. But I think these two tools in combination have solved a big problem.

I can go get the entire kit I want something from and bring it into my GOPATH.

I can vendor just the log package from the ardanlabs kit into my project: govendor add github.com/ardanlabs/kit/log

Update it as need be: govendor update github.com/ardanlabs/kit/log

govendor will see a vendor folder exists inside of github.com/ardanlabs/kit during the add and update. So before it copies log into my projects vendor folder, it can look at any dependences that log is using that needs to also be added/updated from that vendor folder first. Again, if the dependencies already exist, the tooling just has to let me know so I can make a decision about what to do. This is not a big deal. I have knowledge and control.

This is where the vendor file is critical. I want to know if what I already have vendored in my project for this package matches what is vendored in the kit project. If they are the same, nothing needs to be reported, just copy the code. If they are not the same, I need to know. This is where using commit ids can be too granular, but then again, using semver could be a problem if the code bases are really not the same. So maybe a combination of knowing both.

"Hey the commit ids are not the same but both commits report the same version 1.0.2. How do you want to proceed?"

I'm not saying this logic is trivial, it is not. But if we start with tooling that can report issues and give us options to choose from, it would be a huge head start. Once we learn more and fix bugs, some of that can be automated.

In the end for me, this just needs to be manageable to the extent that I am not hesitating to use a dependency. I am at that point now with the vendor flag and govendor. I also take a minimal approach to using dependencies and don't run into these edge problems. Using the projects vendor folder to pull dependencies when it exists will take care of some of the edge cases.

I just wish the vendor experiment would isolate a vendor folder to the root of a project and dictate a single vendor folder per project. This would simplify things tremendously.

sdboyer commented 8 years ago

@jbuberel - ahh, yes, I see what you mean now. Sure, go-kit makes sense as an example. Thanks.

@ardan-bkennedy - great, so, the crucial question is very clearly out there, now.

The language is package oriented.

Yep.

A package is the basic unit of compilation, and we are taught to think in terms of components (packages).

Yep.

Our API's are based on a package. We import packages.

Yep.

So why would package and dependency management not be package based?

First, because Go packages are not (necessarily) individually retrievable units. go get creates the illusion that they are - convenient, but illusory. I get that some people may not want to care about this, in the same way that I don't really care about the cardboard boxes Amazon ships me stuff in. But I'm pretty sure that Amazon wouldn't be so successful if they treated boxes as some transparent, irrelevant detail around my stuff.

Second, as previously noted in this thread, versions matter, and versions are a property of the SCM, not the Go package.

I realize these points might seem like small details. But I'd encourage folks to step back and think about other situations where information from one discrete level of a software architecture has been conflated into another. In bad cases, these sorts of issues are difficult to diagnose (or even comprehend), and can induce years of hair-tearing. This, IMO, is one such case.

Back in 2013 when I learned how go get worked, it seemed that a repo should represent a single package. This would allow the package to be go gettable and everything that comes with that. But I think we have learned that repo management is a lot of work, so minimizing the number of repos you need to manage is important. This leads to the idea of a repo representing a project.

I think it's important to distinguish between the general problem of package management, and the lessons that go get has taught us over the years. IMO, the vagaries of go get have gaslit us, and stuck us optimizing around local optima. The next step forward in this area should be rooted in deeper changes, not course corrections begotten from go get experience.

In that vein, I see those kit repositories (go-kit, gizmo, yours), and this proposed spec, as a totally reasonable response to the situation go get creates. Certainly, I see how your examples address your immediate problem. But this proposal is a very permanent band-aid on just a symptom of the real problem.

Really, you said as much yourself:

In the end for me, this just needs to be manageable to the extent that I am not hesitating to use a dependency.

I do empathize with this feeling. Very much. Under different circumstances, I would probably agree that it's a good enough basis to move forward with a change that makes an incremental improvement in the status quo. But it's not appropriate here, I think, because there is a mountain of prior art on package management - ideas we can follow to escape our local optimum. And this approach would, I suspect, conflict with that.

I am working on an article that lays out a bigger-picture way of thinking about the information a package manager needs, and how any sort of tool should operate. I hope to have it up by the end of the week.

Aside - the irony here is that we're having these challenges when the structure of Go programs and the constraints of the compiler make solving this problem so much easier than it is in other languages - including ones that currently have much better solutions. We have such potential!

goinggo commented 8 years ago

This is a question of code ownership to me. I must own all the code I use and only use the code I need. If the repo is nothing more than a box. What is inside the box is important. What things in the box I want are important. The box is not what is important. The box for the package I want is not more important than the package itself. Don't make this about the box. The version being applied to the box does not negate that the version can be applied to an individual package in the box. It follows the shipping model you are describing very well.

Own the code your project uses, don't lease it. Too much risk in leasing and taking more than you need. Minimize, reduce and simplify.

On Dec 30, 2015, at 1:07 PM, Sam Boyer notifications@github.com wrote:

@jbuberel - ahh, yes, I see what you mean now. Sure, go-kit makes sense as an example. Thanks.

@ardan-bkennedy - great, so, the crucial question is very clearly out there, now.

The language is package oriented.

Yep.

A package is the basic unit of compilation, and we are taught to think in terms of components (packages).

Yep.

Our API's are based on a package. We import packages.

Yep.

So why would package and dependency management not be package based?

First, because Go packages are not (necessarily) individually retrievable units. go get creates the illusion that they are - convenient, but illusory. I get that some people may not want to care about this, in the same way that I don't really want to care about the cardboard boxes Amazon ships my stuff in. But I'm pretty sure that if Amazon wouldn't be so successful if they treated boxes as an irrelevant detail.

Second, as previously noted in this thread, versions matter, and versions are a property of the SCM, not the Go package.

I realize these points might seem like small details. But I'd encourage folks to step back and think about other situations where information from one discrete level of a software architecture has been conflated into another. In bad cases, these sorts of issues can cause difficult to diagnose (or even comprehend), and induce years of hair-tearing. This, IMO, is one such case.

Back in 2013 when I learned how go get worked, it seemed that a repo should represent a single package. This would allow the package to be go gettable and everything that comes with that. But I think we have learned that repo management is a lot of work, so minimizing the number of repos you need to manage is important. This leads to the idea of a repo representing a project.

I think it's important to distinguish between the general problem of package management, and the lessons that go get has taught us over the years. IMO, the vagaries of go get have gaslit us, and stuck us optimizing around local optima. The next step forward in this area should not be rooted in go get-based course corrections, but a more foundational improvement.

In that vein, I see those kit repositories (go-kit, gizmo, yours), and this proposed spec, as a totally reasonable response to the situation go get creates. Certainly, I see how your examples address your immediate problem, but this spec treats a symptom of the problem with a very permanent band-aid.

Really, you said as much yourself:

In the end for me, this just needs to be manageable to the extent that I am not hesitating to use a dependency.

I do empathize with this feeling. Very much. Under different circumstances, I would probably agree that it's a good enough basis to move forward with a change that makes an incremental improvement in the status quo. But it's not appropriate here, I think, because there is a mountain of prior art on package management - ideas we can follow to escape our local optimum. And this approach would, I suspect, conflict with that.

I am working on an article that lays out a bigger-picture way of thinking about the information a package manager needs, and how any sort of tool should operate. I hope to have it up by the end of the week.

Aside - the irony here is that we're having these challenges when the structure of Go programs and the constraints of the compiler make solving this problem so much easier than it is in other languages (that have solved it better). We have such potential!

— Reply to this email directly or view it on GitHub.

robfig commented 8 years ago

@goinggo Does your position then say that packages within a repo should specify the acceptable versions of other packages within that same repo, rather than being able to assume they are all used at a consistent hash? If no, then it seems like an inconsistent position. If yes, then it seems implausible (since I can not imagine anyone ever doing that)

In my opinion, repos that are a grab bag of unrelated packages are the exception, not the rule. Most multi-package repos are simply made to support a single functionality, and the extra packages are simply because some functionality requires a lot of code and packages are the way to organize a lot of code. For example, robfig/soy has a high level interface in the top-level package, but provides a lot of related tools and lower level functionality in the sub-packages. It would not make sense to have some of those packages at one revision and others at a different one.

ardan-bkennedy commented 8 years ago

All the packages within a repo for a given hash are all in the same version. The decision to vendor one means you vendor the others that are dependencies. I don't understand why that is implausible or an issue. If you are updating a vendored package why would you not also update its dependencies? It is the responsibility of the kit owners to make sure each commit does not break the entire kit of packages.

On Dec 30, 2015, at 2:41 PM, Rob Figueiredo notifications@github.com wrote:

@goinggo Does your position then say that packages within a repo should specify the acceptable versions of other packages within that same repo, rather than being able to assume they are all used at a consistent hash? If no, then it seems like an inconsistent position. If yes, then it seems implausible (since I can not imagine anyone ever doing that)

In my opinion, repos that are a grab bag of unrelated packages are the exception, not the rule. Most multi-package repos are simply made to support a single functionality, and the extra packages are simply because some functionality requires a lot of code and packages are the way to organize a lot of code. For example, robfig/soy has a high level interface in the top-level package, but provides a lot of related tools and lower level functionality in the sub-packages. It would not make sense to have some of those packages at one revision and others at a different one.

— Reply to this email directly or view it on GitHub.

sdboyer commented 8 years ago

@goinggo

My Amazon analogy clearly missed its mark, but that's OK; it's a flimsy analogy, and chasing it down would be counterproductive. So, dropping it.

This is a question of code ownership to me.

Absolutely, entirely agreed.

I must own all the code I use

For sure. This is a necessary precondition for reproducible builds.

and only use the code I need.

I assume "use" here means "have available in the source tree," because...what else could it mean?

Why?

Before compiling a project, do you delete the entire contents of your GOPATH, except for the transitive closure of imported packages from the main package you're compiling? If not, then you are 'using' more code than you need, which means this is not actually a "must." The compiler doesn't care - why do you?

I think you're conflating a correctness issue with a performance issue. Sure, I get that it feels nice and parsimonious to build a source tree with strictly only the packages that the compiler will use...but the compiler already does exactly this. Compilation will be negligibly faster (if at all), and the produced binary will not be any more efficient.

All you're saving is some disk space, and maybe some network time. It is not worth making a base language/tooling architecture-level change in to optimize those things when there are less drastic means available, especially when that change would preclude or complicate other approaches.

The version being applied to the box does not negate that the version can be applied to an individual package in the box.

This explicitly blows past the "level confusion" issue that's been mentioned several times. So, I'll put it a different way: when all you have is an immutable commit-hash backed, (Go) package-level perspective on dependencies, how do you address the diamond dependency problem?

Crell commented 8 years ago

I blame @mattfarina for my participation in this thread...

I'm still very much a baby Gopher. Maybe Gopher fetus, frankly, as I'm still just playing with Go. However, I have a long history of PHP experience, including watching the PHP world transition in the last few from a crappy package manager that no one liked (PEAR) to a really nice one that most people are now using (Composer). I've also been involved in the Drupal community for the past decade, which has had its own package handling for a long time (because PEAR was so poor) and is trying to transition to Composer, a process that is still ongoing. Hopefully some of that experience here is useful, especially since it seems an awful lot of people migrate to Go from PHP.

Also a disclaimer: I used to work with @mattfarina @technosophos and @sdboyer. (Hi guys!) However, I've not used Glide, just vanilla go get so I have no particular horse in this race myself.

It feels like there's an unstated assumption that some are making that is not clear to others. That is, the needs of a library and the needs of an application instance are quite different.

A library needs to declare (at minimum):

Its identifier by which others can reference it.
What its dependencies are, if any.
The version ranges of those dependencies with which it is compatible (a minimum, maximum, or both)

An application instance needs to declare (at minimum):

What its dependencies are, if any.
The precise version of those dependencies with which it is compatible. In this case, "precise version" means commit ID, or something equally precise to indicate a specific snapshot of source code.
Optionally, post-download instructions or other details relevant only to the top of a dependency tree. (These could include test-only dependencies.)

Note specifically that an application instance doesn't need an identifier (although it doesn't hurt it to have one), but more importantly that an application instance needs far greater precision in terms of its dependencies' versions. That's because a library is specifying "these are the versions of a dependency you could use", whereas an application instance is specifying "these are the versions of a dependency you should use". That's a very important distinction.

Say I'm releasing a YAML parsing library, and it depends on a file system utility library. (This example may not make any sense for Go, but swap your own example nouns in if so.) My YAML library should, assuming no bugs, work with version 2 of the FS library, but not the 1.x version, and I don't know if it will work with the 3.x version that doesn't exist yet. Or it could depend specifically on the 2.4 version of the library as that version introduced some new feature I rely on. When 2.4.3 comes out, fixing bugs that I don't care about, I don't want to have to go in and tell my library that it requires 2.4.3 rather than 2.4.2. Odds are it doesn't matter to me, and it's just more work for me to change the specific commit hash I depend on. Rather, I depend on a certain public feature set. Doing so would also preclude another library that want at least the 2.5 version of the FS lib, which would also work fine with my YAML parser but so does 2.4.

An application instance, however, does need that specificity. In order to ensure multiple developers are working with the same dependency, and the CI server is testing the exact same code (and set of bugs) that I have on my laptop, it needs to start from the precise exact same set of lines of code. If I decide to upgrade one of the libraries my application is using, that should be a conscious, deliberate decision on my part so that everyone on the team gets that change at the same time. The "assuming no bugs" statement that a library can make for its dependencies simply cannot be made by an application instance.

Packaging solutions that ignore that duality tend to run into problems. It looks like the Glide Matts tried to avoid it for a long time before finally giving in and accepting the two-case solution: http://technosophos.com/2015/12/11/why-glide-0-8-is-our-biggest-release.html (That link explains this duality better than I am, probably.)

(Note that I'm also not covering applications, which are slightly different than application instances. A ready-to-install application could go either way, depending on specific commits or ranges, depending on a wide number of non-technical factors. But it will fall into one of those two use cases, so I am not covering it separately.)

That is, there's a need for a "build file" (which specifies ranges) and a separate "lock file" (which specifies precise snapshots by commit ID).

It sounds like the OP wants to standardize the lock file, and then punt on the build file. I firmly believe that is a very bad idea, as those two need to work in concert. The build file is a human-editable file; the lock file is a more precise result of aggregating all available build files together at a specific point in time. That means every package being aggregated needs to be readable by whatever tool is doing the aggregated. If some of the libraries in question are using Glide's format, some are using glock, some are using none of the above, then the aggregating tool needs to understand all possible build file formats. Alternatively, every library could maintain redundant copies of a build file, one for each build tool available. Both of those are crappy situations that should be avoided, hence the need for a standard build AND lock file format.

Another important factor here is versioning. Quite simply, I do not believe any package dependency management system can function without a (near-)universal versioning scheme, which is what Semantic Versioning provides. Without semantic versioning of packages none of this will work. Period, kthxbye. The only way it could work is if no package ever has a BC break ever in its entire commit history, which is about as realistic as the Chicago Cubs winning the Super Bowl. (Think about it...)

Thus, I would argue that a successful Go package manager file format MUST:

Expect Semantic Versioning of packages.
Include a "build file" format, that is human-editable that every library uses to declare the version ranges of its dependencies that it supports.
Include a "lock file" format, that is machine-readable that application instances, and only application instances, will include in its repository. (Libraries would NOT do so.)

This wheel has been built before and always ends up round, so let's just skip straight to round wheels.

Crell commented 8 years ago

An additional point, on libraries vs. repositories. To add to what @sdboyer said, I'll cut to the chase and say this is a bad idea waiting to explode in your face.

Drupal (my primary OSS project) has supported multiple modules (extension libraries) in one repository for 15 years now. When a couple of libraries are closely related -- such as when you've one main module and a few optional modules that provide integration with some other module -- it can absolutely be convenient to cluster them together in a single repository. It's very tempting. It's also a huge mistake.

Doing so results in the following issues:

You cannot download just the pieces you need. You have to get it all, which may or may not be prohibitively large.
Detecting what entry points are present in a downloaded bundle becomes more difficult. This is particularly true for the build file. Where do you find multiple build files? How do you tell them apart?
When you specify a dependency, do you depend on the library or on the meta-package that includes that library? What if those are sometimes the same name, sometimes not? (This is not a hypothetical example, but a common situation in Drupal.)
How do you version just one of the libraries in that repository? If you have just one lib in the repository then tags become a super-easy way to denote versions. If you can't rely on a tag (since each library versions separately), how do you figure it out? Do you have to include that in the build file? What happens when (not if) you forget to change the build file?
If I have to deep-link to get to a specific version of a sub-library, how do I know I'm getting the right version? If the deep-link changes for the version (the way it would in SVN, for instance), do I have to update my identifier? The identifier then sometimes becomes a version specification, sometimes not. But that precludes ranged dependencies (at least version 2.4.3 as above), which are a necessity for libraries.

Some of those issues may not be entirely relevant for Go (since the compiler, I think, can skip unused source code entirely), but the principle is the same and there's probably other Go-specific issues. Linking one repository to one package solves all of those issues, and allows the use of tags as the version specification mechanism. Simple, elegant.

Many of us are pushing for Drupal to abandon multiple modules per package to avoid the above issues. Generally speaking, I'd agree with @sdboyer that if multi-library repositories seems like a good idea then you're solving the wrong problem. It's a solution to a problem that shouldn't exist in the first place. Honestly I've never had much issue maintaining separate libraries in separate repositories.

Some other examples in the PHP space:

The Symfony project includes about 30-ish components, all developed in a single repository. However, they also go through an extensive song and dance to split them off to their own read-only repositories so that they can be reused individually. This process is automated, but was quite a chore to setup and is not fast.
The Zend Framework project used to do the same, but this year broke all of their libraries up into separate repositories to avoid all of the issues Symfony has. They also can now make a new release of just one library without having to version-bump the entire suite. Some libraries are just inherently more stable and less in need of change, so bumping their version just because its siblings do is rude to downstream users of that library.
The Aura project (another component framework) has a separate repository for everything and has since the beginning.

I believe the KDE project now separately versions all of its components, too, and has a separate version for the integrated package that is independent of the versions of the underlying libs. In all, I see that as being the trend, not moving more toward one-repo-to-rule-them-all.

It actually took me a while to realize that the discussed "repository level" vs "package level" was backwards from what I'm used to seeing. At first, I thought package-level meant in a central repository like Packagist.org (the default index of PHP packages that can be used by Composer), rather than a subset of a repository. Such a central index is likely a separate discussion, but that's how far from anything I've seen elsewhere such an idea is.

In conclusion, I do believe that a Go package manager should assume 1 library==1 repository, as it simplifies a huge number of questions. If there's some reason that maintaining multiple repositories is problematic, that is an issue that should be fixed on its own rather than trying to paper over it.

kardianos commented 8 years ago

@Crell Thanks for your input.

Your (3) points (semver, build file, lock file) is a very succinct description of what you see. Thanks.

Java, PHP, .NET have libraries and programs. Go has packages and the distinction is important. It is also important to note that every single go file (not package, not project, not repo) is entirely self describing in terms of dependencies. But I do get what you are saying.

If you are looking for a central package index, I find godoc.org very useful.

1 library == 1 repository doesn't make sense in Go. As before, go doesn't have libraries. Here is an example of a single repo: https://godoc.org/golang.org/x/crypto . I really don't see a need to break out each of these packages into its own repository. Or maybe it is a library? If so, they are almost entirely independent from each other (ssh, bcrypt).

Or you can take https://godoc.org/github.com/youtube/vitess/go/sync2 , a useful package I've used in the past. Granted, it is part of the vitess youtube repo, but it isn't in an internal directory and it works well. I copy it to a vendor folder for non-main package using a tool, I don't want the rest of vitess, just that. So have I sinned? go get works with my package, it is pinned, upstream can change all it wants and I don't care. I've vetted the code I bring in and re-test it. At this point my package "owns" it. What if they release a security update? Well, if we have a standard lock file, finding packages that use the tainted revision should be simple for machines to do (I sure don't want to check by hand, I've got tools for that).

I think there could be value found in creating a build file, though I'd prefer to just work with a CLI tool. I think there could be value found in using semver (I'm certainly not against using semver). But practically, I don't run into the issues you are describing with go when using simple existing tools, (godep, glock, govendor). In other words, I find the tools are largely adequate. But I find that each tool writes down the same thing: package, revision. Just in a different machine format.

You mention the compiler is smart. It actually is better than that. Using https://godoc.org/go/parser and friends, you can quickly read the top of the go files and build a dependency tree with just the source! This is actually what govendor does. If you run "govendor add +external" it writes down all your packages and copies them to your vendor folder. You don't have to check in files in the vendor folder, but you can. I find it much faster than dealing with a handwritten build file. It does so package by package, so you only get the packages you need, locked in exactly as you want them.

Honest question, were you ever able to do anything like that in PHP?

I appreciate your viewpoint. But I'm struggling to apply them to go.

sdboyer commented 8 years ago

1 library == 1 repository doesn't make sense in Go.

I've been reflecting on this basic idea (which is at the core of the disagreement here) more over the course of the day, and feel that I can say this much: the fact that Go does have such clear boundaries around its packages, and parsing dependency information can be done fast and unambiguously, is an interesting and important property to consider for Go package management.

There's validity to the notion that other langs have landed on "repo as lib" as a way of establishing a boundary when the language itself doesn't have a strong opinion. And, since Go does boundaries well on its own, we needn't rely on the repository for that. This is clearly true, as evidenced by the various examples @kardianos and @ardan-bkennedy et. all have offered in the thread.

My issue, though, is that I think folks are interpreting that "needn't" as a "shouldn't" - as evidenced by the quote I pulled from @kardianos. It's a logical fallacy, even - just because Go doesn't need repos to help define API boundaries doesn't mean that repos should just be treated as dumb code shipping containers. 1 library == 1 repository absolutely can make sense for Go. Literally thousands of repositories adhere to that pattern. It's just not the only feasible way of grouping code.

The 'kit' repo case, and the package-level vendoring that their use could entail, is one case that a 'complete' Go package management solution must either directly support, or at least not preclude. But it's putting the cart before the horse to focus on it now. The general, base case is easily grabbing lots of different repositories, at tight or flexible version ranges, and locking those resolutions into place for reproducible builds.

That entails versioning schemes - again, a repository property, not a code property - not just locking to a commit hash. Dependency lock-in is real, and this (emphasis mine):

I find it much faster than dealing with a handwritten build file. It does so package by package, so you only get the packages you need, locked in exactly as you want them.

Isn't really true. I almost never have enough information to be certain that the revision locked in is the one and only one that will work. What I want is to specify a range of acceptable versions, have the machine pick a version for me that meets those constraints, and THEN only change the locked-in version when a) I tell it to or b) I add a new dependency that creates a broken diamond, and some wiggle room is needed to resolve it. Locks are necessary, but not sufficient; without version range constraints to loosen things up when needed, locks are like overtightened screws.

kardianos commented 8 years ago

@sdboyer RE releases: I completely agree that it would be good if more authors "released" packages. I would also assume that if you are compiling a program, you have already picked out the version if you want a specific version, or updated to it, possibly into GOPATH first.

I know version ranges are something myself and others wouldn't want to use.

However, what I'm asking for is consensus on what might amount to a lock file. Would this proposed lock file be incompatible with a theoretical build file? The glide lock file looks compatible to me, but I could be missing something.

sdboyer commented 8 years ago

I know version ranges are something myself and others wouldn't want to use.

And that's fine. The typical approach to a manifest admits the specification of a tight revision - i.e., immutable commit hash - as well as a floating value (a branch) or a should-be-immutable-but-could-float value (a tag), or ranges composed thereof.

However, this proposal can't admit ranges. And, if you name a branch/tag, then it can't guarantee reproducible builds, because there's nowhere to store the actual, immutable commit ID that those were resolved to.

However, what I'm asking for is consensus on what might amount to a lock file.

I get that - and I do apologize to the extent that I've derailed the discussion from specifically that.

So, to the point: I can't support the proposal, because

I don't believe lock files should exist without manifests, or manifests without lockfiles. Both are necessary but individually insufficient. Consequently, lockfiles without manifests should be considered harmful. The piece I'm working on will explain why in detail, but briefly: manifests describe user intent, and lockfiles the result of processing that intent. Conflating these two has nasty UX and correctness implications.
Being that I view lockfiles as a machine-generated thing, having a 'comment' field doesn't make sense.
Package management should be a completely isolated "phase 0" of compilation. To have the 'tree' property in the spec file would necessarily muddy the waters between compilation and package management (as the compiler, via go/build, would necessarily have to change its behavior based on that flag)
A sorta-shared, sorta-not lockfile where different package managers can add their own fields would create unmanageable implementation complexity. See, for example, Ruby, where the most that rbenv, rvm, rbfu, and chruby have been able to share is a .ruby-version file

Crell commented 8 years ago

Observation: The people in this thread pushing hardest for a separate build and lock file all have extensive PHP experience prior to getting into Go, and who were in PHP for the birth of Composer. I don't know if this is a mark in favor or against that position, just an observation. :smile:

@kardianos Composer's build file, composer.json, is just a plain JSON file so it's human editable but also machine editable. The composer CLI tool includes a bunch of commands to manipulate it. Here's a few examples that are part of a typical workflow:

$ composer init

Creates a new composer.json file, asks a few questions wizard-style to prepopulate it. Generally this is done at the very start of a project.

$ composer require foo/bar

Adds an entry to the "requires" section of composer.json for the foo/bar package, then downloads it into your /vendor directory. By default it will select the current stable version, and provide a version range of "^2" (that is, accept version 2.anything, but not 3.anything.) You can also specify a specific version or version range on the command line, or edit the file afterward.

If foo/bar has other dependencies, those get downloaded automatically as well with versions all being figured out to "the most recent version that satisfies all requirements."

At this point, a composer.lock file is also created that contains the aggregate of my composer.json, foo/bar's composer.json, and the composer.json of any of foo/bar's dependencies, as well as the specific commits that are installed right now.

$composer update foo/bar

Checks foo/bar again for the most up to date version that still satisfies all specified dependency ranges, downloads that, and updates the composer.lock file accordingly.

$ composer update

Updates all packages in your top-level composer.json file to their latest versions that satisfy all dependency ranges, and updates the composer.lock file accordingly.

With Composer, you're pretty much never supposed to check /vendor into your repository. That means when joining an existing project, your initial checkout will contain only the first-party code, composer.json, and composer.lock. There will be no vendor directory at all. The first step is therefore to run

$ composer install

If you have a composer.lock file, composer install will download the exact commit snapshots referenced in that file. That guarantees you that you have the exact same code as everyone else working on the project, down to the last comment line.

If there is no composer.lock file, then composer will download the latest version of all packages that satisfy all dependencies and generate a composer.lock file for you. You want that on an application instance, but not for a library. Hence why when I'm building a library I do NOT check my composer.lock file into the repository. When building an application instance, I do.

Composer also has a really nifty feature where you can checkout a repository as a non-vendor, with no VCS control files, but all of its dependencies in a /vendor directory. That's helpful for bootstrapping a project you're expected to customize. Symfony, for instance, can be installed with:

$ composer create-project symfony/standard-edition mypath

Which will then download the "standard edition" project (which is a repository), then run composer install on it to get all of its dependencies. The symfony/standard-edition is a starter-kit for projects based on the Symfony framework (one of many), and provides pre-set code that you can and should then modify and add your own code to. (And in fact all of the Symfony libraries themselves are considered a 3rd party dependency of your project, then.) Other, more complete applications can use the same tool but then you don't really modify them.

The result is that when you start a project, you get the most-recent-version of everything. As you develop a project, you only change the version of a dependency you have when you specifically want to.

In all of those cases, the composer.json file is human-friendly and human-editable if you are so inclined, but composer.lock is not.

If you're developing several of those libraries in concert, Composer lets you download full git checkouts rather than just snapshot tarballs, making pushing changes back upstream quite easy.

Composer's dependency resolution logic began life as a PHP port of OpenSUSE's package management library, so it has a fairly solid pedigree. Distribution maintainers have been dealing with dependencies far longer than languages have.

All of that relies on the assumption that a package == a repository, and is versioned as a single unit. Technically you can have any number of namespaces within that package/repository. It's conventional for the package vendor to also be the top-level namespace of the code it contains, but that's not at all required by the tools. So if you want to cluster multiple related pieces of code into a single package, but in different namespaces, you can do that. But they are still all versioned and tagged and downloaded as a single unit. You cannot download "just this subdirectory of package foo/bar". Actually, I think that's fine given Go's smart compiler as even if you download a package with 5 different clusters of code in it only the one you're using will end up in the binary, so at worst you're using some extra disk space for devleopment. Boo hoo. :smile:

In the (little) Go I've done, I've had several files that were in the same namespace because it made sense to organize the code that way, logically. I don't know if that's idiomatic or not, but I favor that over a single 10k line file if it happens to be a larger package.

I think one subtle difference between Go's current conventions and PHP's is that in Go, the package name is the namespace and is also the URI of the repository where the code lives. With Composer in PHP, there's an extra layer of indirection where a package's identifier is specified in the composer.json file, and then indexed on Packagist.org. So the foo/bar package above is the bar package in the foo vendor-space, which usually coincidentally corresponds to a GitHub user of "foo" and a "bar" repository. That's not at all required, though, just a convenient common pattern. So no, Composer can't locate all dependencies just from source parsing, but it can just from build-file parsing.

There's probably a lengthy debate that could be had about whether it's better to have that extra indirection or to use GitHub repo names as package names, always. I don't want to get too far into that right now, but I will offer the following advantages of that extra layer of indirection:

If the maintainer of a package changes, you don't necessarily need to rename it. Vis, the most popular HTTP client for PHP is called Guzzle, and its Composer name is guzzlehttp/guzzle. That happens to live on GitHub at https://github.com/guzzle/guzzle. (Note the slight difference in name.) It could also move to BitBucket without breaking anyone's code. The maintainers would just need to update the repository record on Packagist.org and poof, no one else needs to care. (As an extreme, imagine how messy it would be for Go if GitHub went down hill like Sourceforge and Freshmeat before it did and everyone decided to migrate to another platform. How much code would need to be modified?)
It makes it really easy to make personal forks of a 3rd party project, say if you have bugfixes you need that haven't made it upstream yet. By specifying additional overriding repositories in your composer.json file, you tell Composer "use these instead of what Packagist.org says, if relevant". Then when your changes get merged upstream you can just remove your repository overrides and start getting the official upstream version again, no other changes required. Without that, using a custom fork of a project, even temporarily, requires modifying all of your source files.

No doubt some of that is not relevant for Go, but hopefully that paints a better picture of what the split between lock and build files offers. The lack of versioning is, I would say, the number one downside of the go get architecture right now (which is also partially dependent on the use of repository names for namespaces), so any improved packaging system needs to have a good answer for it.

mattfarina commented 8 years ago

@crell Thank you for all your insights into what's happening in PHP and your opinions on this topic. It's good for us to hear an outside view.

I'd like to step back and share two conceptual things to this conversation that I think are going unsaid but may be useful to explicitly state.

There is a difference between observation, such as observing a user need and collecting it in a requirements or a description of what a package manager for another language does, and opinion of what we think is a good idea. If we can focus more on observing user need and solving those with common pattern we could more easily move away from debating opinions which can easily go the color of the bike shed route.
I think (and please note this is my opinion) that a focus should be on enabling users of the output to be successful. That is users of any tooling and those who create packages for others to consume. To set them up for success in the easiest manner possible.

With these two things in mind I'd like to share some observations and opinions (which I will call out separately).

Observations:

The language that have more recently created package management solutions (e.g., PHP, Rust, JavaScript, Elixir) have both a manifest file and a lock file. Some tooling, developed before these modern systems, has the ability to have versions and lock in their manifest.
Every major package manager for every language does version numbers. Their manifest files support specifying acceptable ranges.
I've been collecting requirements for this space. If we are trying to solve a problem it's important to know the requirements and these are some I've collected.

Opinions:

Those those who develop libraries for others to consume should be enabled and encouraged to use good practices for both them and the consumers of the library. I'm using library here instead of package because a library could be one or more related packages to solve a problem. Consumers of a library need to know about API changes (including additions), have responsible handling of security (and know the library maintainer will do that), have documentation on changes over time, and other elements commonly found in libraries. We should setup library creators to have examples of these good practices.
Retrieving a package from within a large monorepo (like the example in vitess @kardianos shared) are a common frustration for many developers. Even before I had a need to use one I was hearing complaints from others about them. They may be acceptable for some. Many others are annoyed by them. For example, to use the SDK to talk to the kubernetes API you need to pull down the entire kubernetes codebase (which includes all its dependencies). If you develop on kubernetes you already have it so that's not a problem. If you are just using the Go pkg to talk to the API it's an annoyance. In open source it's rare to find the two in the same repo (and there's no way to escape cloning the whole repo even if you only use a part).
User experience matters. In this case we often call it developer experience. Go is a young language that's gaining market share. We need a delightful user experience that helps developers approaching it be awesome when they use it. Package management is one of the things many people complain about and is often considered a hiccup in an otherwise delightful experience. We have an opportunity to iron that wrinkle out now.

The Go compiler isn't written in a vacuum from other compilers. In fact those behind the Go compiler have experience with compilers and virtual machines for other languages. That experience and knowledge in the space influenced the Go compiler. Package management (which is lifecycle management) is a well developed space that developers are used to. What's developed for Go should take knowledge and experience from that space into account when being developed here.

I'm thankful so many people are passionate about this space. It would be useful to set aside our own emotional attachments so we can craft something useful.

Note: useful = usability + utility

kardianos commented 8 years ago

I'm hearing two design preferences:

Prefer to copy dependencies you don't control into the local repository and write down the revision you copied.
Prefer to not copy into your repository and split what you write down into two files, a design file and a lock file.

We agree that:

encouraging code authors to release their code in some manner is a good thing.
writing down the final revision you use in an executable is important for reproducible builds.

In your examples, you put the version spec in the design file and the revisions in the lock file. If the tools you used copied the version spec and any other needed information into the lock file, then you would only need the one file when comparing versions.

I have considered adding a version field to the vendor-spec, but I need to get experience with versions, in Go, before doing so. As such I was going to implement remote package fetching and then version parsing in govendor to gain such experience in Go.

The vitess sub-package I use in my db interface doesn't require that downstream developers fetch vitess because I've copied the packages I need locally.

You are correct that the Go specification was written with many past examples to draw from. It also drew from examples of what not to do. In consulting, I often see users define use cases that made sense in their previous paradigm (old software or on paper), but isn't fully valid in the new paradigm (new software). But that doesn't mean the use cases were without reason.

I enjoy hearing specifics of what other package managers do and how you have used them. I am confidant that a reader of this issue will fully see the details of two perspectives presented here.

My question is, is there design room to enable both methods in the same file format?

I have observed that while I usually want to copy packages locally, there are cases where I totally understand the need not to. I think Cockroach DB offers a good example of a need to not copy packages locally. It depends on a few large C++ dependencies it statically links in and are largely developed separately. It also depends on a few other large dependencies it needs to track with upstream. I want to ensure that this is possible with any vendor-spec.

I understand your experience with PHP and would like to assist if I can, but I would like you to understand that I see version ranges as a symptom of a much greater problem. When I see version ranges for puppet and ruby and other systems, I gag. I am (unfortunately) familiar with what you are describing and the experience has not left a nice taste. The alternative is to encourage releases and to encourage package stability. When you update to a new revision, you double and triple check everything is still alright. That might sound trite but I am completely serious.

Let me put this another way. I'm a fan of up-to-date static containers. I'm less of a fan of the current state of Linux package managers where everything depends on everything else (Can I build the latest version of X? No, I'm on an stable release and I don't have the right libs for that.). If Linux dependency managers are modern then containers are neo-modern.

...

As a side note, go already has the ability to offer redirects to go packages:

Go offers a level of indirection not with a central clearing house, but decentralized, the neo-modern way.

Crell commented 8 years ago

When you update to a new revision, you double and triple check everything is still alright. That might sound trite but I am completely serious.

I don't think it's trite at all. Regardless of the package manager in use, you have tests, you run tests with a CI system, and you don't deploy unless everything is green. I don't think that's a controversial statement to make. (That's one reason why everyone here seems to be on board with a machine-readable lock file that references commit hashes as a part of the solution.)

I would like you to understand that I see version ranges as a symptom of a much greater problem.

This is the part I don't get. Why is that a problem? I tested my YAML library, version 1.5.2, with the 2.4.3 version of a file system library, and I know that works. When 2.4.4 comes out, with some bug fixes:

Should I be required to test that and release a new version of my library, 1.5.3, whose only change is that it now depends on 2.4.3 of the FS lib? That means every time a package is released there's a huge ripple effect on any other packages that use it, all on down the line.
Should someone using my YAML library test it with the new 2.4.4 release, then hack the code to make it use the 2.4.4 release instead of the 2.4.3 that it said to use initially? That seems very error prone.
Should we assume that if I declare a compatibility with 2.4.3, then the odds of 2.4.4 working as well are really good and if it's not, that's something for the application instance builder to figure out and report? (This is what PHP and, AFAIK, most languages now do, and why semver is a necessity.)

I think (please correct me if I'm wrong here) your ideal world involves no version tags at all, just commit IDs on all the things. While that does offer a level of predictability, it has two I believe terminal problems:

Without version numbers, there's no way to indicate when a change is "safe" for everyone to update to and when it's not, for BC reasons. BC breaks in code will happen, always, guaranteed, and a good package manager needs to make handling that straightforward. Version numbers that communicate the level of safety are the standard mechanism for that, and I've not found a better one.
As @sdboyer notes, diamond dependencies. I tested my YAML library with FSLib 2.4.3, and it works. @sdboyer's Atom parser library was tested with FSLib 2.4.2, and it works. The latest stable of FSLib is 2.4.5. Now when you install both my YAML parser and Sam's Atom parser in your application... what version of FSLib should be used? If both the YAML and Atom parsers are pinned at a specific commit hash (either directly or via a specific tag), you cannot use them at the same time. The package manager will report a conflict. If Sam and I both specify a range of "2.4.x", then it's clear 2.4.5 should be installed and everything should be fine. (And yes you still run tests to make sure that's still the case.)

How do you solve those issues without version ranges?

Re redirects, nifty, I didn't realize that. I'll have to look into that more the next time I'm messing with Go more regularly.

kardianos commented 8 years ago

@crell, I'm not against version numbers, just ranges. Releasing to the default branch is still releasing and doesn't need to be associated with version numbers even. But I do like version numbers. I'm think it is a good idea to include them if available in a vendor-spec.

Diamond dependencies: You example 2.4.{x,y,z}, which version do we use? When you go to update or add a package, right now my answer is to prompt the developer what to do. That may not be the right answer, but I would need to see data in go that that isn't.

All this said, I'm not against helping enable you use version ranges. It could be we have sufficient commonality we can have a net win here.

sdboyer commented 8 years ago

I'm hearing two design preferences:

I'm not quite sure that captures it - if "not to copy into your repository" means not use vendor/, then that's definitely not what I'm pushing for. If it means "not to commit in the vendor dir", then while that's my personal preference, a tool should be agnostic on that front.

I think the use of the word "copy" here, though, points at the core of our disagreement. It presupposes a repository has been fetched and is available locally somewhere (presumably in the GOPATH). If I understand your perspective correctly, you think this is fine because you consider the fetching more or less irrelevant. You don't care where it comes from, so long as you can get the desired revision in the right place.

I disagree with this perspective, because I think retrieval is an integral part of the responsibilities of a package manager (which are basically: specification, retrieval, and on-disk arrangement, managed with respect to changes over time). These responsibilities end up being tied together not because it is impossible to do it a different way, but because joining them together lets the package manager control all the relevant state at every step of the process. Playing with other peoples' state is the source of most complexity fractals I've encountered.

In your examples, you put the version spec in the design file and the revisions in the lock file. If the tools you used copied the version spec and any other needed information into the lock file, then you would only need the one file when comparing versions.

Yes, that's a possible approach. It's also one that I've learned to avoid the hard way, because it creates ambiguities that are hard to think about in the abstract, but quite painful as soon as you try to implement. You end up forcing the user to resolve the issues with arcane command switches, when a better design could have just avoided the problem in the first place. (Again, more specifics in the article I'm working on - an out-of-context example is not very useful)

In consulting, I often see users define use cases that made sense in their previous paradigm (old software or on paper), but isn't fully valid in the new paradigm (new software). But that doesn't mean the use cases were without reason.

Sure, of course, that's always an important perspective to consider. But the reasonable capabilities of tooling based on the lockfile-only spec you're proposing are a strict subset of those that can be achieved by a more robust, project-oriented, versioning-aware package management tool (like glide). For this to hold, you have to be able to cover those use cases at least roughly as easily as a more robust tool would. @mattfarina has mentioned his collected, albeit still probably incomplete set of use cases repeatedly, but I've yet to see you address them. (Unless I've missed it? In which case, sorry)

My question is, is there design room to enable both methods in the same file format?

Well, since yours can be a strict subset of a fuller package manager, I don't really see the point in that? And also, see my earlier comment about playing with other peoples' state, and my earlier earlier comment about how poorly shared formats worked for ruby's environment managers. But most importantly, no, lock files without manifests is harmful, so it's a non-starter.

So, basically...probably not.

When I see version ranges for puppet and ruby and other systems, I gag.

They're not a panacea. And yes, people often use them wrong. But that doesn't make them inappropriate as part of a solution.

Diamond dependencies: You example 2.4.{x,y,z}, which version do we use? When you go to update or add a package, right now my answer is to prompt the developer what to do. That may not be the right answer, but I would need to see data in go that that isn't.

It's not the right answer. The right answer is providing enough flexibility in a specification, via a version range, that the system can try to resolve it for you automatically, tell you that it did, and then you can verify and make sure that it's correct. By giving up and asking the user to resolve it, you're requiring the person likely to have the least relevant knowledge to find the correct, or at least an acceptable, answer. Version ranges are far, far from perfect, but they at least allow the intermediate package authors to encode some of their knowledge about what versions of the conflicting dep are likely to work. The end user still has to verify that the choice made was acceptable and correct, but with version ranges, they can a) get some help and b) send a patch back to the intermediate dependency if the range turns out to be wrong.

This is the approach that most package managers that take a stab at this problem have arrived at. The onus is not on us to provide an example in Go; the onus is on you to demonstrate that Go is substantively different in the ways that make it not apply here.

Crell commented 8 years ago

If the manifest/build file supports ranges, then specific version tags are a trivial degenerate case. (Actually composer supports specific commit hashes, too, but their use is generally discouraged.) So if your application instance wants to require my YAML lib version 1.5.3, specifically, then that's the only thing it will download or be compatible with even if I release a new version. Similarly, if you specify FSLib 2.4.4 specifically in your manifest file then there's now 3 dependencies on FSLib: 2.4.2 or higher, 2.4.3 or higher, and 2.4.4. So the build tool can resolve that to 2.4.4 and every package is happy.

That's actually a not at all unreasonable thing to do for an application instance, as it does give you more precise control. It's libraries (ie, something without a main() function in it) that really benefit from ranges the most, and their downstream users, by minimizing diamond dependency issues. If the build tool asks the user "it looks like these are not compatible, do you want to override me and install it anyway and take your chances?" rather than outright denying it I'm not against that myself. I can't speak for @mattfarina and @technosophos, of course.

If the manifest file supports ranges, which by nature can include a specific tag, then I think that would give everyone what they're looking for. Application instances can have a very tightly controlled set of legal dependency versions, libraries can be more liberal, and those who want more tight control even on libraries (eg, @kardianos) can still do that. If that ends up causing issues for downstream users, well, that's a social pressure "let the market decide" question, not a technical one, so I think is fine to punt on.

mattfarina commented 8 years ago

@kardianos You noted the "two design preferences". I think it's important that the design not be a matter of preference but one that solves the needs captured in writing (commonly called requirements, use cases, etc). Requirements should be able to be solved in more than one way (hopefully). What I have not seen is you talk about the requirements I've collected and how your design can be used to meet those. Or, that a requirement is invalid and a reasoned case why. If I believed the need was clearly defined and met in this design I would not push back and ask do many questions.

In this discussion numerous technical questions have been raised that need to be solved in one form or another but gone unanswered. If you have a way to do it utilizing this spec please share. For example, how do you do vulnerability analysis and reporting (e.g., CVEs) without version numbers?

You noted things that are not valid in a new paradigm when it comes to requirements. I'm quite familiar with the idea of paradigm shifts and helping people overcome them. What new paradigm is here over the previous ways and how does something here overcome an old problem, old solution, or old need and why?

To do this well we need to have a technical design that meets needs. If you're not interested in discussing needs, how to handle situations, and listening to the needs other have how can you craft or advocate for a spec that affects this space?

I'm giving you every opportunity to answer technical concerns, show me a new concept, share how a requirement is no longer or need, or anything else. If you have something new, innovative, or an insight that no one else has found please teach us. If you can't show us how to solve the issues, fill the needs, and handle cases beyond those you may personally encounter than we have a problem. The group who is here listening is open to new ideas, to having their minds changed, and is often on the forefront of those kinds of things. We're listening.