Move to TOML for manifest and lock

sdboyer commented 7 years ago

DECISION IS MADE - we'll be using TOML.

At the time dep went public, the committee was generally in agreement that we would prefer to use TOML or YAML over JSON for the manifest and lock files. There are a few reasons for this, but the primary one is that those formats, unlike JSON, allow users to include arbitrary comments. Because the manifest is the locus of dependency decisionmaking for teams, such comments can add a lot of value:

# we pinned this to a rev because of <link to bug>. remember to unpin when that's fixed!

However, in looking through the existing TOML and YAML implementations in Go, we found that none of them actually support preserving arbitrary comments when regenerating the files. dep is constantly rewriting the manifest and lock files; without support for comment preservation, each rewrite would blow away any user comments. So, without this feature, TOML and YAML lose much of their advantage over JSON.

There are other arguments for using TOML or YAML, and it is possible that the we may still elect to use them even without support for comment preservation. However, we are putting a call out - if anyone feels particularly itchy at the thought of using JSON for these files, then please, contribute comment preservation support to an existing implementation, and we'll make the switch! The deadline for this change is the relevant milestone, after which the format and structure of the manifest and lock must be stable.

Alternatively, if we've missed an implementation of TOML or YAML that does support comment preservation, or there's some other crafty way around this problem, please tell us here!

UPDATE: we've decided to generally not have the tool update the manifest, which relieves us of the need for a comment-preserving implementation.

kardianos commented 7 years ago

In what instances would users be expected to manually modify any of the files, manifest or lock?

freeformz commented 7 years ago

FWIW: I don't think it should generally be necessary with the exception of adding comments or the possibility that there are some seldom used features that are (at least to start) implemented through directly modifying the manfiest file. That's my 2c anyway.

jessfraz commented 7 years ago

agreed, don't think there should be much reason other than comments to directly edit the manifest, but if there is we should think about if that is the user experience we desire

adg commented 7 years ago

I vote for sticking with JSON.

sdboyer commented 7 years ago

In what instances would users be expected to manually modify any of the files, manifest or lock?

echo what @freeformz and @jessfraz said. it's also possible that, when we start looking at direct integration into the go toolchain, we have to trim the command set down.

kardianos commented 7 years ago

when we start looking at direct integration into the go toolchain, we have to trim the command set down.

@sdboyer Is there a design document for a go tool integration? Are there docs from internal discussions? Are you just hypothesizing?

jessfraz commented 7 years ago

somewhere around here: https://github.com/golang/dep/issues/25#issuecomment-270460908

sdboyer commented 7 years ago

Is there a design document for a go tool integration?

Nope, there are no more design docs hiding anywhere.

Are there docs from internal discussions? Are you just hypothesizing?

I'm basing this on some email chains with @rsc. He set out some desired constraints for how he wants the overall go toolchain to work, and the first thought I had was, "it would be easier to meet these if we weren't trying to provide a command for ever possible manifest change." I shared that thought in an email about two weeks ago, and there's been no further discussion since. That's where we're at.

kevinburke commented 7 years ago

Funny, I wrote about this recently - this is out of left field, but the only Go parsers I found that preserved comments were for XML, HCL and Go source files themselves, and they are about equally difficult to work with. https://kev.inburke.com/kevin/more-comment-preserving-configuration-parsers/

Here's a parser that finds a version number in a Go source file and bumps it: https://github.com/shyp/bump_version. I've also written code to manipulate HCL files and it's a little finicky but the resulting file is recognizable. https://github.com/hashicorp/hcl

nathany commented 7 years ago

There is also JSON5, which does allow comments, though I don't think it's anywhere near a standard.

Though I prefer the readability of TOML > YAML > JSON, another consideration is the licensing and CLA if dep is later incorporated into the go toolchain. As far as I understand, the TOML or YAML libraries would need to be contributed under the Google CLA or rewritten (with comment preservation as well). /cc @willnorris

kevinburke commented 7 years ago

I'll just put this out there: I'm available for hire as a contractor, and would love to work on a CLA-compliant, Go TOML parser that can preserve comments and allow AST manipulation, if anyone's interested. https://burke.services

andrew commented 7 years ago

+1 for a non-executable format for manifest+lockfile, this will allow third party services (like https://libraries.io) to analyse the dependencies without needing to execute arbitrary code download from the internet.

Also YAML is basically a superset of JSON, so you could mix and match, but just because you can doesn't mean you should,

Aside from that, allowing comments to explain why you switched to a particular fork or pinned to a weird tag would be ✨

kr commented 7 years ago

There is a format that supports arbitrary comments, with regeneration preserving the comment positions, is supported by the Go standard library, and has syntax familiar to all Go programmers: Go source code.

Imagine a manifest file (with no .go filename extension) in a subset of Go syntax:

dependencies = {
        "github.com/Masterminds/semver": {
            "branch": "2.x",
        },
        "github.com/Masterminds/vcs": {
            "version": "^1.8.0",
        },
        "github.com/pkg/errors": {
            "version": ">=0.8.0, <1.0.0",
        },
        "github.com/sdboyer/gps": {
            "branch": "master",
        },
}

Something along these lines wouldn't be too hard to put through ParseExpr and get an AST out, which can then be modified and pretty-printed.

Is that too bizarre?

jessfraz commented 7 years ago

Your level of bizarre reminds me of myself :)

On Tue, Jan 24, 2017, 11:33 Keith Rarick notifications@github.com wrote:

There is a format that supports arbitrary comments, with regeneration preserving the comment positions, is supported by the Go standard library, and has syntax familiar to all Go programmers: Go source code.

Imagine a manifest file (with no .go filename extension) in a subset of Go syntax:

dependencies = { "github.com/Masterminds/semver": { "branch": "2.x", }, "github.com/Masterminds/vcs": { "version": "^1.8.0", }, "github.com/pkg/errors": { "version": ">=0.8.0, <1.0.0", }, "github.com/sdboyer/gps": { "branch": "master", }, }

Something along these lines wouldn't be too hard to put through ParseExpr and get an AST out, which can then be modified and pretty-printed.

Is that too bizarre?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/golang/dep/issues/119#issuecomment-274912263, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYNbITdrboTAIPBES5QJg7-hwW-jZikks5rVlINgaJpZM4LrVlt .

pquerna commented 7 years ago

FWIW, I had the same issues with preserving comments in config files for a project, and ended up using HCL with some wrapper code for my use case.

freeformz commented 7 years ago

@kr Something similar was discussed early on, here was my attempt at the time (which was a little more aggressive than yours)....

package main

// +build deps

import (
    "deps"
)

var (
    // RootPackagName of the Project
    RootPackageName = "github.com/freeformz/tdeps"

    // IgnoreTags that should be ignored when analyzing deps
    IgnoreTags = []string{"appengine", "test"}

    // Dependencies stated for the package tree starting at RootPackage
    Dependencies = []deps.Dependency{
        {
            Name:    "github.com/Sirupsen/logrus",
            Version: ">= v0.9.0",
        },
        {
            Name:     "github.com/heroku/private",
            Version:  "1.0.0",
            Location: "git-hub-enterprise.herokai.com/privatething",
        },
    }

    // Lock of resolved dependencies which may be in vendor/
    Lock = []deps.Dependency{
        {
            Name:    "github.com/Sirupsen/logrus",
            Version: "v0.9.0",
        },
        {
            Name:    "golang.org/x/net/context",
            Version: "someVer",
        },
        {
            Name:    "golang.org/x/oauth2",
            Version: "someVer",
        },
        {
            Name:    "golang.org/x/oauth2/internal",
            Version: "someVer",
        },
    }
)

I don't remember why we didn't go down this route, possibly because my example wasn't a limited subset, but trying to do way more.

I do think @kr suggestion would be interesting to explore.

kardianos commented 7 years ago

@kr @freeformz I'm skeptical of that route, but if it were to be pursued, make those top level vars part of a type so the fields can be looked up / autocomplete type things can assist.

mitsuhiko commented 7 years ago

I hate toml but I must admit that the choice of Toml was good in Rust land. It diffs very well which makes resolving conflicts much nicer than a json file would allow.

typeless commented 7 years ago

The composability with other existing tools does make JSON desirable, which also makes it easier to be 'scriptable' (with the help of jq for example).

Not sure if that really matters in the end though.

vasi commented 7 years ago

An awful but effective workaround I've used in the past: Just have a key that is specced to be used for comments, and nothing else. Eg:

dependencies:
  github.com/Masterminds/semver:
    comment: "2.x is incompatible for reasons XYZ"
    branch: 1.x

This comment will clearly be preserved even if the structure is programmatically modified.

Caveats:

Can get awkward if keys are reordered. But usually it's easier to preserve key order than to preserve comments.
Only one comment is possible per dictionary. Extensions (comment2, commentXYZ…) can be allowed, but that's ugly.
Comments can't be added to arrays, only dictionaries. Not a big problem in practice. Arrays are typically small, or contain dictionaries—either of which allows comments to be close to where they should be.

4ad commented 7 years ago

@kr I like your suggestion (although I have no interest in Go dependency management) because it would encourage and make it easy for other projects to use the same configuration language.

kegsay commented 7 years ago

I would love to see TOML used instead of JSON or YAML.

Hand-rolling JSON files is incredibly fiddly: I find I always miss a , or a :. The lack of being able to justify your versions (as mentioned by the OP) are infuriating when working with package managers like NPM which use package.json. It does parse easily though.

YAML is a lovely format to write in, but extremely complicated to parse. I think devising a completely compliant YAML implementation is very tricky, and even if dep manages to use one, what about 3rd parties who want to parse these files?

TOML gets the balance right IMHO: easy to parse, easy to write and easy to read. The most popular TOML parser for Golang AFAIK is https://github.com/BurntSushi/toml - so perhaps @BurntSushi or @cespare may have some thoughts on adding comment preservation support? I'd be interested in helping out.

BurntSushi commented 7 years ago

@Kegsay I'd expect comment preserving transformations to be a pretty significant change. I haven't done well with keeping the TOML implementation up to date with the latest spec either. I would personally support a replacement fork (i.e., we retire or move BurntSushi/toml) if you found it useful to start with the existing code.

jrick commented 7 years ago

It may be too late in the design phase to change something like this, but I am completely ok with dropping the requirement of being able to modify the manifest through the tool instead of just hand editing it. Especially if the manifest is in an easily readable and writable format such as TOML. If the manifest is never rewritten by the tool, then there's no need to make it preserve comments.

kegsay commented 7 years ago

@BurntSushi thanks for the quick reply! I feel that your TOML parser is the most "battle-worn" out of the available parsers in that it superficially seems to have the most usage, and therefore has been tested in a large amount of real-world scenarios. For that reason, I'd be reluctant to do a complete rewrite unless the change was sufficiently large that you'd end up doing a rewrite in order to implement comment preservation. Given you wrote the parser, do you believe that we'd need to start from scratch?

BurntSushi commented 7 years ago

Given you wrote the parser, do you believe that we'd need to start from scratch?

It's not just the parser that needs attention. It's also the printer too. You need some way to carry comments through both parsing and commenting. There may be other issues, like preserving the order of keys, but I can't quite remember. I can't really be any more specific than "significant change" as to what's required. Comments were given precisely zero thought other than dropping them at lexing time.

groob commented 7 years ago

Has anyone considered HCL yet? https://github.com/hashicorp/hcl It's a bit like JSON but supports human readable comments.

jbrodriguez commented 7 years ago

dep is awesome !

FWIW, I think the current manifest syntax is too verbose.

if the comments requirement is dropped, I'd prefer something more minimalistic

Node (package.json)

  "dependencies": {
    "classnames": "2.2.5",
    "font-awesome": "4.6.3",
    "history": "3.0.0",
    "isomorphic-fetch": "2.2.1",
    "react": "15.2.1",
    "react-dom": "15.2.1",
    "react-router": "2.6.0",
    "react-tree-menu": "jbrodriguez/react-tree-menu#683858e",
    "reactorx": "0.6.5"
  }

cespare commented 7 years ago

Hi! I help maintain github.com/BurntSushi/toml and I use TOML in a fair number of projects. I also have a strong interest in Go dependency management (I even wrote a few tools in the past, though these days I just use govendor + the vendor directory).

I support using TOML as the configuration language for this tool, but if dep needs to be able to rewrite the config file while supporting comments it will make things hard.

I have experience writing a parser+formatter (in Go, for another language) that supports comments. Handling them requires coming up with a bunch of heuristics about how to group comments and attach them to AST nodes so that they will be preserved in sensible ways in the output.

You can also see all the logic in the go/* packages (everything gofmt uses) to implement similar heuristics.

The main issue here is that most languages (including TOML and, I think, YAML) don't define how comments fit into the AST structure. (Total off-topic aside: nim is the only language I'm aware of that does.) So you're left with writing a bunch of semi-arbitrary rules and you keep coming across corner cases that don't feel right to the end user (they still crop up in gofmt from time to time).

All that said, I'd like to push back a bit on the premise:

dep is constantly rewriting the manifest and lock files

In other two-file dependency management systems I've used (mainly Ruby's bundler, but I think cargo and others work this way as well), the manifest file is only touched by humans and the lock file is only generated by machine. These properties make this problem go away: the manifest is only ever parsed, not generated, so comments are left untouched, and the lock file doesn't contain comments and isn't altered by hand.

Is it really impossible to restrict this manifest rewriting in dep? Here are two ways to avoid regenerating the manifest:

Only ever generate a manifest initially. Future dependency changes must be done by hand.

Only ever append new dependencies to the manifest. For example, if the format looks like this:

someglobalconfig = "blah"

[[dep]]
path = "github.com/fsnotify/fsnotify"
branch = "master"

[[dep]]
# This is a comment.
path = "golang.org/x/sys/unix"
branch = "master" # some other comment

Then it is easy to append a new dependency programmatically by adding a few lines while leaving the rest of the file untouched:

[[dep]]
path = "github.com/kr/pty"
branch = "master"

I've read the dep help text and played around with it briefly; I apologize if I missed some fundamental design reason why it must be able to fully rewrite the manifest.

Finally, as much as I like TOML, I think that avoiding the need to programmatically rewrite hand-written TOML is important enough that JSON may still be the better choice if manifest-rewriting is a hard requirement. Even just allowing for a single comment string attached to each dependency gets a lot of the value here:

{
    "dependencies": {
        "github.com/fsnotify/fsnotify": {
            "comment": "we pinned this to a rev because of <link to bug>. remember to unpin when that's fixed!",
            "commit": "abc123"
        }
    }
}

Sorry for the wall of text. I will gladly pitch in to help out if TOML changes (or even a new TOML library) would be useful.

seh commented 7 years ago

I agree with @cespare here: the manifest is my artifact; the lock file is derived from it. No tool should touch my manifest without me explicitly requesting that it do so as a convenience.

For most operations Glide works this way. When I decide to invoke an operation that might have it change my manifest, I have to manually restore the comments that it drops (not to mention reordering entries to match my subjective preference of what belongs where). Out of such suffering comes avoidance of those mutative operations in favor of manual editing.

freeformz commented 7 years ago

FWIW: dep only alters the manifest based on commands the user issues as a convenience, aside from dep init anyway which does some discovery of existing deps and writes out the initial manifest file. You could alter it manually and run dep ensure.

My personal opinion is that I'd much rather have the tool do the alteration when I run a command and the work to enact the alteration instead of having to edit a file and then run a command to enact the alteration.

BurntSushi commented 7 years ago

In case it's not clear: I completely defer to @cespare with respect to the fate of BurntSushi/toml.

kegsay commented 7 years ago

The problem with the JSON-style comments of the form:

{
    "dependencies": {
        "github.com/fsnotify/fsnotify": {
            "comment": "we pinned this to a rev because of <link to bug>. remember to unpin when that's fixed!",
            "commit": "abc123"
        }
    }
}

is that it breaks down a lot the moment you want to have any kind of extended prose (you can't have literal new lines), and you need to escape characters (e.g "), making it rather frustrating to put anything more than a quick one-liner. One can argue that is a good thing since it encourages people to only put comments if they really have to, but discouraging documentation isn't a good thing IMO.

punya commented 7 years ago

If we used YAML but with a comment field as opposed to an actual comment, we'd get preservation for free as well as support for multi-line prose

dependencies:
  "github.com/fsnotify/fsnotify":
    comment: >
      we pinned this to a rev because of <link to bug>.
      remember to unpin when that's fixed!
    commit: abc123

The downside is that this will confuse people who are used to the native comment syntax.

sdboyer commented 7 years ago

catching up...

A new/custom format

I think it would be unwise to invent any new format for this purpose, a la @kr's proposal. Comment preservation is nice, but it's not required, and IMO is less important than any one of the following:

This tool already involves creating a lot of new things that are essential to the problem. New things are risky. We should not make new things for non-essential problems.
It's hard to predict who else might want to read this data for what purpose, but @andrew already showed up with a use case. Creating a custom format would make that harder to do.
The information we need to keep in a manifest is easily expressed declaratively. Comment preservation isn't worth sacrificing that, or even the appearance of it, with a [pseudo]-imperative format.

Manifest r/w

Brought up by a few people, but quoting from @cespare:

In other two-file dependency management systems I've used (mainly Ruby's bundler, but I think cargo and others work this way as well), the manifest file is only touched by humans and the lock file is only generated by machine. These properties make this problem go away: the manifest is only ever parsed, not generated, so comments are left untouched, and the lock file doesn't contain comments and isn't altered by hand.

This is also my preferred approach. I even have a cute graphic more or less to this effect: hot-mess (humans to the left, machines to the right)

We opted for this approach, though, because the dominant opinion amongst those who showed up was to have the tool handle updating the files. And I can't really argue with @freeformz's point that it's clearly more of a pain to edit a file THEN run a command, rather than simply run a command. If we were to go back to hand-editing the file, though, then this would be mostly solved - init can generate the manifest to start with, and from there on, it's hand-modified.

...with one exception: we also want to nudge users towards semver, and specifically carat range constraints, as a sane default. Without that sane default, users who don't care will have to provide their own default, which I worry could lead us to a project ecosystem with lots of unnecessary version constraint conflicts. So, being able to write new deps to the manifest is pretty key...but as @cespare noted, is probably workable just by appending the information.

So, if we were to drop commands that basically mutate the manifest for you, we might be able to get away without comment preservation.

Comment preservation

Again, @cespare:

The main issue here is that most languages (including TOML and, I think, YAML) don't define how comments fit into the AST structure. [..] So you're left with writing a bunch of semi-arbitrary rules and you keep coming across corner cases that don't feel right to the end user (they still crop up in gofmt from time to time).

On the one hand, I'm loathe to "meh" at problems like this, and I don't have any sense of what the corner cases look like, or how absurd the unexpected outcomes can be. For our needs here, though, perhaps we'd be able to tolerate it? My thinking on that:

We won't be encoding any real semantics within comments, so there's no functional need for precision.
The only truly intended audience is other contributors who are looking at the file. There isn't even something like godoc; even if we do end up creating a registry, we can just show the manifest file in its entirety. So, if a comment gets misgrouped and moved around, then it's still fine as long as a human is able to sorta figure out what happened.
Manifest changes generally tend to be small and come a few at a time. If the machine did something weird, the user (or reviewer) is more likely to notice it and take corrective action than with an oddity buried in some chunky diff of actual code.

kr commented 7 years ago

I agree with @sdboyer's response. (But note that I was not proposing an imperative format.)

nathany commented 7 years ago

@kr If I saw that the manifest looked like Go code, I would expect to be able to use imperative code as I could with Ruby Bundler or Elixir Mix. Then be disappointed when it didn't work, or worse, confuse what the dep tool doesn't allow in the manifest with what Go allows. The familiarity is nice, but probably best to avoid (IMO).

sdboyer commented 7 years ago

@kr glad you agree :) and yeah, sorry, I know you weren't proposing imperative - I was lumping things together for brevity. As @nathany points out, it's much more about the expectation. That the conversation after your comment slid so quickly towards real .go files/syntax is evidence of how strong that expectation would be.

nathany commented 7 years ago

If the manifest is intended to be human-editable or not, the location (#207) and name (#168) could help suggest which is the case, even without reading the documentation.

nathany commented 7 years ago

we also want to nudge users towards semver, and specifically carat range constraints, as a sane default

With a human-editable format, I'd most likely specify constraints similar to the existing ones, either provided by dep init or others working on the project.

If the commands are simple, the CLI wins, but editing a file has the advantage of being self-documenting. I can see the existing constraints and copy-pasta to get things working, rather than memorizing the UI to the CLI. That, and the ability to add/see comments such as "keep at version x.y due to incompatibility with z".

I admit I also like organizing the deps into logical groups with short descriptions of what they do. Of course dep status or similar could probably do that for me.

nathany commented 7 years ago

After more discussion and consideration, my opinion is that the "manifest" (#168) should be a human-editable format (preferably not JSON) that can be generated on init but shouldn't be modified or mucked with by the tool.

That means comments and order is preserved because the file is only ever parsed.

nathany commented 7 years ago

This issue is a big deal for how dep operates in practice.

Is the dep UI primarily a CLI that maintains a machine-readable manifest for you?

Or is the primary UI a human-editable manifest file with a few simple CLI commands that enforce it?

In both cases, the third and currently most important input into dep is the import graph derived from the code it is operating on. I ponder this more in #213.

peterbourgon commented 7 years ago

Great, foundational questions! They deserve answers:

Is the dep UI primarily a CLI that maintains a machine-readable manifest for you?

Yes.

Or is the primary UI a human-editable manifest file with a few simple CLI commands that enforce it?

No.

nathany commented 7 years ago

@peterbourgon Well, that's almost what it is now. But is that what it should be?

Why have two separate .json files not intended for the user to touch?

Using a CLI requires memorizing syntax and consulting help, whereas a file-based UI provides example usage just by having constraints already specified, not to mention comments for documentation right there.

Personally, I think a simpler CLI has advantages. Fewer commands could allow easier integration into the Go toolchain.

If it's a CLI-only UI, then fine, but push further that way and remove the whole notion of a separate manifest that someone might expect to edit. Then a single dep.json should be enough.

peterbourgon commented 7 years ago

There are reasonable technical arguments for every position you've enumerated. We've gone through a lot of deliberation internally, and settled on separate manifest and lock files, and the tool as the primary mechanism of interaction.

For the record, the most important justification for separate files, as I recall, is to have a very clear separation between user declaration of intent (manifest) and computed declaration of reality (lock), especially for git diffs.

It's on me for not documenting that reasoning process sufficiently, and I'll work on that in the coming weeks in a Wiki.

nathany commented 7 years ago

@peterbourgon Having a separate file for the user to express their intent is great.

What I'm really not so sure about is having that be a machine-modified manifest file modified exclusively by CLI commands.

Please see #213 for some example workflows that utilize a human-editable file.

ukiahsmith commented 7 years ago

We can not be sure, with 100% certainty, that a a manifest file, the goal of which is user intent, actually expresses user intent when that file is updated by CLI commands.

It is far easier to accidentally run an unwanted CLI command, or to run a command twice, than it is to manually edit the manifest file.

Having a manifest file untouched by automation means that there is a very clear separation between user declaration of intent and computed declaration of reality.

peterbourgon commented 7 years ago

Your point is taken, but putting manifest files in unexpected or undesired states is just as easy with bad edits as it is with bad commands, especially given the state of config file formats.

sdboyer commented 7 years ago

Big update here: as described in https://github.com/golang/dep/issues/213#issuecomment-278831924, we're going to start moving away from having the CLI UI be the main manipulator of the manifest, and instead move the ongoing experiment that is dep towards hand-editing of the manifest (outside of dep init) and a terser command set.

In other words, the new direction mostly obviates the need for comment preservation :)

nathany commented 7 years ago

Hand-editing the manifest gets around the need for comment preservation while allowing comments, which is great.

Comments are handy for several reasons. One is to explain why a particular user intent is declared. For example, locking "foo/bar" at an older version due to an incompatibility, LICENSE change, etc.

Comments also could be used by dep init to provide some basic documentation or URLs to further information for the manifest format.

While the lock file is perfectly fine as JSON without any comments, I personally think it would be worth exploring alternatives for the manifest file.

golang / dep