golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.13k stars 17.56k forks source link

x/vgo: go.mod format should not have a bespoke syntax #23966

Closed robpike closed 6 years ago

robpike commented 6 years ago

It's a mistake to create a private syntax for a configuration file when there are existing, perfectly fine formats available that are well understood and have publicly available parsers.

robpike commented 6 years ago

P.S. This issue is made thornier by the peculiar assertion that the format is fixed before anyone has a chance to comment on it.

ericlagergren commented 6 years ago

My vote is for XML.

(On a more serious note: it'd be nice to have something that's easy to write by humans. JSON is easy to read, annoying to write. YAML is nice both ways.)

jmank88 commented 6 years ago

Perhaps some of this dep thread from last year is relevant: https://github.com/golang/dep/issues/119

as commented 6 years ago

https://github.com/hashicorp/hcl/blob/master/README.md

Mentions YAML being confusing and not well understood. I dont particularly understand it either, considering the standard disallows tabs as separators, which is unusual and awkward for a whitespace agnostic language like Go.

http://www.yaml.org/faq.html

jimmyfrasche commented 6 years ago

I'm having a very hard time reconciling "yaml" and "perfectly fine format". It's not the description that springs to mind based on my experience.

A benefit of a custom format here is that only what's allowed is legal. Another is that everything can be given a nice expressive syntax. Error messages can be more easily tailored.

The similarity to Go syntax means it shouldn't be hard for anyone to learn it and syntax highlighters and the like should be easy to adapt from their Go counterparts.

The only major downside I see is that, as a new format, its implementation will require a certain amount of fuzzing and additional testing that would (hopefully) already be done otherwise. (And if the parser is put in the stdlib no one else will have to worry about that either).

ericlagergren commented 6 years ago

...why not just have it be written Go?

I mean, we all know how to write it. We have well-tested lexers and parsers. We have syntax highlighting. We have formatters and tools that can vet the code. We even have a framework for parsing and running sets of files in go test.

This go.mod file

// My hello, world.

module "rsc.io/hello"

require (
    "golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
    "rsc.io/quote" v1.5.2
)

could become something like

package foobar

import vgo

func ModuleHelloWorld(v *vgo.V) {
    v.Module("rsc.io/hello")
    v.Require("golang.org/x/text", "v0.0.0-20180208041248-4e4a3210bb54")
    v.Require("rsc.io/quote", "v.1.5.2")
}

It'd end up being similar to how go test recognizes xxx_test.go files.

vgo could recognize a go.mod, module, xxx_module.go, or whatever file in the root of the project and run the top-level function similar to ModuleXXX kinda like TestXXX.

Yeah, it doesn't feel like a set of directives as much as runnable code, but since when has Go done something just so it feels good as opposed to the practical option?

Theoretically, this could also take care of https://github.com/golang/go/issues/23972

andybons commented 6 years ago

Let’s avoid bikeshedding on which existing format is best and wait for a response on why a custom syntax was chosen in the first place. It may have been an arbitrary decision, or it may not have. If it wasn’t, then understanding the decision will help inform future choices.

ghost commented 6 years ago

Most folks have settled on TOML. We don't really need another custom format or a format embedded in JSON or YAML.

davecheney commented 6 years ago

Given that the go.mod file looks very similar to a go source file, why not add module and require as top level declarations and then we can write module syntax inline with our source code?

On 21 Feb 2018, at 15:58, david karapetyan notifications@github.com wrote:

Most folks have settled on TOML. We don't really need another custom format or a format embedded in JSON or YAML.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

flibustenet commented 6 years ago

Indeed I felt immediately the tour trying with mod.go instead of go.mod !

ngrilly commented 6 years ago

@davecheney I guess the go.mod file makes it easy to find the project root. If module and require become top level declarations in a "normal" .go files, then it would be more difficult to find the project root (you basically had to look for a .go file containing a moduledeclaration, which requires parsing).

rogpeppe commented 6 years ago

The problem with TOML and YAML is that no-one has written code (AFAIK) that can read those formats (including comments) and write them back out again, gofmt style. See https://github.com/BurntSushi/toml/issues/213 for example. Also, YAML is a terrible format. Please no YAML.

I think I quite like the choice of a custom format as long as there is some straightforward way to convert to/from a well known format, because it can be exactly as simple as necessary, and as clean as possible.

robpike commented 6 years ago

Whatever the format is, it must be well defined and well documented, with a canonical formatting and non-internal libraries to read and write it. None of that exists at the moment.

ecowden commented 6 years ago

I'm very reluctant to jump in here. I'm liking what I see from vgo so far, and I don't want to bikeshed on what might feel like a trivial topic.

However, I feel that part of the friction I'm feeling from my initial vgo experiments comes from the rest of the tools that I use to write Go and work with code in general. I think this is an opportunity to make adoption a little easier.

Here’s why I think we should consider adopting an existing common data format:

Motivations

Requirements

I don’t have strong feelings about YAML vs JSON vs whatever else. I’ve used JSON fine with npm and YAML fine with Kubernetes, Helm, and Ansible. They both work, and I’m long past the point in my career where I care about arguments like that. (And for what it’s worth, I’ve never been bugged by the lack of inline comments — READMEs and Issues worked for the rare cases we needed to communicate about dependencies.) From where I’m sitting, the requirements are:

Apologies in advance if I'm off base. I'm fairly new to Go myself, and I confess that I don't yet understand some the original motivations for a bespoke file format. There may be good reasons to go another direction that I'm overlooking!

huguesb commented 6 years ago

@ecowden

Hierarchical. For instance, .properties files are too restricting to future extension.

@rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool. Given that dep made an explicit decision to go with TOML partly because it wasn't hierarchical, it seems unlikely that vgo would reverse that requirement.

From https://github.com/golang/dep/issues/119#issuecomment-287781062

The one thing that does stick out with TOML is, being not tree-structured, it's possible for us to append constraints to the manifest without rewriting it. That may turn out to be a very important factor in applying sane defaults that help guard us (that is, the entire public Go ecosystem) against nasty exponential growth in solver running time.

@ericlagergren While I like the simplicity of reusing go syntax, using the .go extension for the module file makes it likely that some projects will run into a conflict and have to rename some of their files to switch to vgo, which goes against the goal of making the migration as painless as possible.

ericlagergren commented 6 years ago

The file name is not super central to the idea, IMO. Le mer. 21 févr. 2018 à 08:05, Hugues notifications@github.com a écrit :

@ecowden https://github.com/ecowden

Hierarchical. For instance, .properties files are too restricting to future extension.

@rsc https://github.com/rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool. Given that dep made an explicit decision to go with TOML partly because it wasn't hierarchical, it seems unlikely that vgo would reverse that requirement.

From golang/dep#119 (comment) https://github.com/golang/dep/issues/119#issuecomment-287781062

The one thing that does stick out with TOML is, being not tree-structured, it's possible for us to append constraints to the manifest without rewriting it. That may turn out to be a very important factor in applying sane defaults that help guard us (that is, the entire public Go ecosystem) against nasty exponential growth in solver running time.

@ericlagergren https://github.com/ericlagergren While I like the simplicity of reusing go syntax, using the .go extension for the module file makes it likely that some projects will run into a conflict and have to rename some of their files to switch to vgo, which goes against the goal of making the migration as painless as possible.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/23966#issuecomment-367375465, or mute the thread https://github.com/notifications/unsubscribe-auth/AFnwZ2Ni2oC28frH2klja-K6ywyOXBCgks5tXD6RgaJpZM4SM3IZ .

davidpope commented 6 years ago

Other concerns aside, JSON does not allow comments, which is sufficient to disqualify it IMO.

josharian commented 6 years ago

Whatever format is in use, I certainly hope (with Rob) that there are good public manipulation libraries.

Here are some comments from years of working on goimports:

rsc commented 6 years ago

This issue is made thornier by the peculiar assertion that the format is fixed before anyone has a chance to comment on it.

My point, which was arguably phrased too strongly, is that go.mods people write today will be understood by the eventual official tooling. I want to make clear that people will not have to throw them away and start over. Given that vgo already supports reading nine different legacy file formats (GLOCKFILE, Godeps/Godeps.json, Gopkg.lock, dependencies.tsv, glide.lock, vendor.conf, vendor.yml, vendor/manifest, vendor/vendor.json), I am confident it won't be a burden to read this one too, if we move to something new. And the tooling already rewrites go.mod in place when needed, so updating to a new format will be easy if that's what we decide. I was not attempting to lock this in place.

rsc commented 6 years ago

It's a mistake to create a private syntax for a configuration file when there are existing, perfectly fine formats available that are well understood and have publicly available parsers.

I obviously agree with this in principle. In practice I spent a while looking at all the existing formats and found them not "perfectly fine" for this job. In particular, look at how much shorter and clearer a go.mod is compared to the equivalent Gopkg.toml. I'm happy to return to this question once we're happy with all the other higher-level details.

And to answer @josharian's concern, if we keep the custom format then yes there would be public tooling, probably along the lines of x/vgo/vendor/cmd/go/internal/modfile.

andradei commented 6 years ago

I like the suggestions of @ericlagergren and @davecheney. It leverages the entirety of the go compiler and its guarantees. But since go.mod is good for detecting the package root, I have a couple of suggestions to keep that advantage while moving towards modules in the source code:

Suggestion 1

Have inline module information on main.go for binaries and lib.go for libraries.

Rust uses the main.rs vs lib.rs to differentiate binaries from libraries, and have a Cargo.toml at the project root. The difference is that, on this suggestion, the module info would be using Go syntax inside a Go source file.

Suggestion 2

Have mod.go for both binaries and libraries, then add vgo.Product = "binary" // or "library" or some sort of const iota instead of strings.

Swift has a Package.swift, which is valid Swift code, at the project root, which specifies whether it is a binary or a library with the Package.products type, which can be .library or .executable

EDIT: Added comparison to suggestions above.

ecowden commented 6 years ago

@huguesb

rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool...

I’m curious: why does “hierarchical” imply “complex?”

Stepping back, I probably misphrased that last requirement. I was looking for an intersection of the familiar and the extensible, and doing so with YAML and JSON on my mind. “Hierarchical” isn’t really the goal here, and I’m happy to scratch it off the list.

I’m surprised to see the reaction about it being complex, though, and I'm wondering if I'm missing something. When I look at the example mod.go files like this one...

module "rsc.io/hello"

require (
    "golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
    "rsc.io/quote" v1.5.2
)

...personally, I see a “hierarchical” data structure. By that, I mean a list of key-value pairs, where values can be primitives, lists, or other lists of key-value pairs. Changing nothing but formatting and punctuation, it becomes:

module: rsc.io/hello
require: 
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2

…And it’s even 4 characters shorter! (That’s a joke, if it’s not obvious. :grin:)

When I jumped in here, I was thinking about extending an existing git repo dependency analyzer written in Node.js to recognize vgo modules. (Well, that, and how I missed the pretty colors my editor makes highlighting files...) Then I realized how much I didn’t want to create and maintain a custom parser, and how much easier it would be with a “standard” data format.

By all means, put this question on the back burner. There are waaay more important things to figure out with vgo, and I like what I’m seeing so far! :+1:

nilebox commented 6 years ago

Even if YAML is considered to be too complex or confusing, I would still prefer it (or JSON, or TOML or whatever other standard declarative format) over bespoke format, and support the subset of it that we are happy with.

In other words, if go.mod is a valid YAML/TOML/JSON (not necessarily supporting all features of these formats), it would make it immediately familiar to both users and any platform that you want to use for parsing.

@ecowden's example above makes it immediately clear to me which format I would prefer.

Another concern with go.mod is that it doesn't even look declarative or standardized, it looks like imperative code. Is there any reason for that? Do we actually want to make it extensible and support imperative constructions there, e.g. functions?

ngrilly commented 6 years ago

@nilebox go.mod doesn't look more "imperative" or "declarative" than an nginx configuration file, for example.

lunny commented 6 years ago

Maybe put the go.mod as a comment on go file. For example:

/*
+require "golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
+require "rsc.io/quote" v1.5.2
*/
package main // import "rsc.io/hello"
lunny commented 6 years ago

Or

package main // import "rsc.io/hello"

import (
    "golang.org/x/text" // require v0.0.0-20180208041248-4e4a3210bb54
    "rsc.io/quote" // require v1.5.2
)
komuw commented 6 years ago

amiga-sound

Too bad my OS(ubuntu) thinks the go.mod file is an audio file. This means I can't just double click and edit the file, I have to go through the hassle of letting my OS know that *.mod files should open in an editor.

cznic commented 6 years ago

You can, the file associations are fully user modifiable. However, using any well-established extension for the vgo module file is a rather unfortunate choice.

rsc commented 6 years ago

I think we should continue to use the very simple go.mod format, after the further simplification of making quotes optional (#24641). Once the dust settles, we should also publish a package like x/vgo/vendor/cmd/go/internal/modfile so that other tools can parse and edit mod files too.


As I wrote originally, I do understand the appeal of a standard file format, but I am still unable to find one that worked well for this task. My main concern is ease of editing, for both people and programs.

The files have to be easy for people to edit. For example, the hacked-up blog post system I built stores a JSON blob at the top of each file, above the post text, because it was very easy to implement that. But I am sick of needing to leave out the comma after the last key-value pair, because it makes adding a new key-value mean editing the previous one too. This is exactly why we allow trailing commas in Go literals. Those annoyances add up.

The files also have to be easy for programs to edit, without mangling it. Think about all the benefit we’ve gotten from gofmt and tools being able to collaborate with people to work on Go source files. People and programs working together on go.mod will be similarly beneficial. In fact this is a key part of the design. If you read through the Tour of Versioned Go you’ll see repeated alternation between the developer editing go.mod and vgo itself editing go.mod. That has to run very smoothly.

All the “generalized key-value pair” formats become awkward when there’s more than a single key-value pair to express. It’s true that we could use a YAML-like notation:

module: rsc.io/hello
require: 
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2

but that nice one-line-at-a-time breaks when we get to replace "rsc.io/quote" v1.5.2 => "../quote". Perhaps the best encoding would be:

replace:
- rsc.io/quote: v1.5.2
  with: ../quote

But then what does replace "rsc.io/quote" v1.5.2 => "github.com/you/quote" v0.0.0-myfork encode as? Maybe this?

replace:
- rsc.io/quote: v1.5.2
  with: github.com/you/quote
  at: v0.0.0-myfork

The awkwardness here is not much, but it’s still quite annoying: three lines instead of one, with corresponding reduced readability and ability to use line-based tools like grep, sort, diff.

The fundamental problem is that not everything a developer needs to say is best expressed as key-value pairs. We don’t use shells that require us to write:

cmd:
- prog: echo
- arg1: hello
- arg2: world

Yet somehow many developers accept this in config files. Why? Because, as Rob said, existing formats “are well understood and have publicly available parsers.” At least, we think that’s true. The more I look at these formats the less convinced I become. And even assuming it's true, that benefit has to outweigh the disadvantages imposed by the format itself.

JSON is too picky (for example, about commas) and has no support for comments. It’s out.

XML is equally picky about closing tags and is too noisy in general. It’s out.

TOML and YAML are at least easier for people to edit, but they both have the general key-value problem.

Additionally, TOML requires quotes around both module paths as keys (because they have slashes) and all values ("rsc.io/quote" = "v1.5.2"). Experience with go.mod suggests we want to move in the opposite direction, toward no quotes. (See #24641.)

Both TOML and YAML also turn out to be more complex than they first appear, a detail that’s very important if you need not just a parser but a mechanical editor that can parse, edit, and reprint the file. TOML’s complexity starts to show once you move away from key-value pairs: you have to learn the distinction between [x] and [[x]] and then start thinking about regular key-value pair lines versus inline tables. Of course, that’s nothing compared to YAML. Here’s an illuminating exercise: flip through http://yaml.org/spec/1.2/spec.pdf and try to find out what syntactic restrictions are placed on unquoted keys and values in key-value pairs. I’m still not completely sure. YAML embeds JSON as a subset but they didn’t stop there. As far as I can tell from the document, instead of writing:

module: rsc.io/hello
require: 
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2

it appears to be equally valid to write:

%YAML 1.2
---
!!map {
 ? !!str "module" : !!str "rsc.io/hello"
 ? !!str "require" 
 : !!seq [
   !!map { ? !!str "golang.org/x/text" : !!str "v0.0.0-20180208041248-4e4a3210bb54" },
   !!map { ? !!str "rsc.io/quote" : !!str "v1.5.2" },
 ],
}

and it also appears the two forms can be blended arbitrarily. Something as simple as

module: !!str rsc.io/hello

appears to be valid YAML yet mean something different from what our “subset” parser would understand. There would be constant pressure to give up the insistence on using a subset of YAML, and yet it becomes more difficult to write a good mechanical editor (parse+edit+reprint) the more complexity is introduced.

If we had to pick some existing format, I’d pick TOML, but even that seems wrong:

module = "rsc.io/hello"

[require]
"golang.org/x/text" = "v0.0.0-20180208041248-4e4a3210bb54"
"rsc.io/quote" = "v1.5.2"

[[replace]]
"rsc.io/quote" = "v1.5.2"
with = "github.com/you/quote"
at = "v0.0.0-myfork"

The [[ ]] are necessary here because [require] is a single table (of key-value pairs each of which stands alone) while [[replace]] is an array of tables, in which each table is one replacement, with three keys: the path being replaced and the special keys “with” and “at”. If you wanted to reserve any possible future expansion you’d have to use [[require]] too, making it:

[[require]]
"golang.org/x/text" = "v0.0.0-20180208041248-4e4a3210bb54"

[[require]]
"rsc.io/quote" = "v1.5.2"

All in all, it doesn’t seem like these file formats are actually helping advance our goal of making the file easy for people and programs to edit. We’d probably have to write a custom parser+reprinter anyway, so the only real benefit would be syntax highlighting in editors. I think that benefit is easily outweighed by the awkwardness of shoehorning our semantics into these files in the first place. If your configuration is a few basic key-value pairs, they make a lot of sense. Ours is not just key-value pairs, so those files don’t make sense.

P.S. I wondered for a long time why it was that “dep ensure -add” did not modify existing constraints in Gopkg.toml. The answer is that Dep can’t reliably modify hand-written TOML, preserving comments and the like. Dep sometimes appends to Gopkg.toml but otherwise imposes the rule that Gopkg.toml is owned by people and Gopkg.lock is owned by programs. This seems to be an artifact of the available libraries as much as it is a design choice.

rsc commented 6 years ago

Based on (1) discussion with Rob, (2) no one replying to my last comment, and (3) the emoji counters on that comment, I'm going to close this issue and keep the bespoke syntax in go.mod (subject to further refinement like dropping quotes).