cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
4.94k stars 279 forks source link

Proposal: package management #851

Closed cueckoo closed 1 year ago

cueckoo commented 3 years ago

Originally opened by @myitcv in https://github.com/cuelang/cue/issues/851

With extensive inputs from @mpvl.

Proposal summary

We propose adding package management to CUE, using an approach analogous to Go using Minimum Version Selection and semantic versioning. The changes are limited to cmd/cue and the cue/load package, and hence have no bearing on CUE the language.

As an interim measure, we propose using proxy.golang.org as a module mirror, and the checksum database sum.golang.org for authentication, until such time as the CUE project can host such services itself. Use of these services will be enabled by default for cmd/cue and cue/load, but entirely configurable, just like cmd/go.

The proposed approach is broadly identical to the approach followed by Go modules. Indeed this proposal borrows heavily from Russ Cox's blog posts introducing vgo, the core Go modules documentation, and interactions with Bryan Mills and Jay Conrod from the Go team.

For those who might be less familiar with Go, in particular Go modules, for the more significant parts of this proposal the relevant parts of the Go module reference have been copied and adapted for the CUE context to save jumping between multiple documents. For less significant parts of the proposal, or those for where the concept is truly identical between both approaches, then we generally choose to link to the relevant Go documentation.

I understand Go modules; what's the TL;DR?

Here is an abridged version for those who have a working knowledge of Go modules, presented from the user perspective with various implementation considerations thrown in where relevant.

What will be different?

What is the same?

Background

The module and package concepts of CUE are directly inspired by their equivalent in Go. It is natural therefore to consider package versioning following the same model as Go.

One important and pleasant fact to note is that because CUE has already established the concept of a module, it does not have to manage a large legacy pre-modules world, unlike GOPATH in the Go project. This is reflected in some of the decisions in this proposal. As a result, we do not need to distinguish between module-aware mode non-module-aware mode: every cmd/cue command is and will remain module-aware by definition, the same is true for cue/load.

The issues

As noted above, CUE has already established the concept of a module. The root of a module is denoted by a directory that itself contains a cue.mod directory. The contents of this directory are mostly managed by the cue tool. In that sense, cue.mod is analogous to the .git directory marking the root directory of a repo, but where its contents are mostly managed by the git tool.

Here is a minimal example that declares a CUE module example.com/blah, a package blah within the root of that module, such that blah imports a third-party CUE package acme.com/quote that is vendored within cue.mod/pkg (written in the txtar format):

-- cue.mod/module.cue --
module: "example.com/blah"

-- cue.mod/pkg/acme.com/quote/quote.cue --
package quote

Hello: "hello"

-- blah.cue --
package blah

import "acme.com/quote"

x: quote.Hello

From the root of the module we can cue eval:

$ cue eval
x: "hello"

Because there are no arguments passed to cue eval, the implied argument is ., the package in the current directory. cmd/cue then hands off to cue/load to resolve and load ..

In this example, example.com/blah is referred to as the main module. As we can see from the package example.com/blah, it has a dependency on acme.com/quote, a package that is not part of the main module. cue/load currently uses the simple rule therefore of searching for all dependencies outside of the main module within cue.mod/{pkg,gen,usr} directories, unifying the result.

In this instance, the acme.com/quote package is not part of a module, but it could well be.

So whilst cmd/cue automatically takes care of resolving imports or package paths for us (via cue/load), everything else is left to the CUE developer. Vendoring of packages within cue.mod/pkg has to be done by hand, there is no mechanism by which dependencies can be fetched from source code hosting sites or remote repositories.

For an initial implementation of cmd/cue this was more than sufficient. Indeed, users have adapted shell scripts to help with creating minimal vendors, and the hof tool has support for building such a vendor via hof mod vendor.

But such an approach will neither scale for a larger CUE user base, nor an ecosystem of tools built on top of/with CUE and the cuelang.org/go/... APIs. Specifically such an approach:

Reimagining our example above, we want to:

Requirements

The following requirements have driven our thoughts on why and how to add package versioning to CUE.

Package versioning in CUE must:

For the remainder of this document we borrow slightly adjusted definitions of the following terms from the vgo proposal:

One important difference from the Go modules implementation is that the CUE implementation should not live entirely in cmd/cue (as it does in cmd/go). Instead the bulk of the implementation will lie within cue/load, with a change to cmd/cue effectively being the means by which the main module and its dependencies are controlled from the command line. That way, existing users of the cuelang.org/go/... APIs can continue to load and work with CUE instances, enjoying the same benefits that a seamless module experience will bring to cmd/cue. For tool authors who use the API but also require cmd/cue-esque control over CUE modules and dependencies, then cuelang.org/go/cmd/cue/cmd command instances can be created and run without requiring the user of their tool to have also installed CUE, as is the case today.

A key building block in CUE is the Go compatibility promise:

Packages intended for public use should try to maintain backwards compatibility as they evolve. The Go 1 compatibility guidelines are a good reference here: don't remove exported names, encourage tagged composite literals, and so on. If different functionality is required, add a new name instead of changing an old one. If a complete break is required, create a new package with a new import path.

Correspondingly, this proposal adopts exactly the same concept of semantic import versioning introduced with Go modules, and with it the import compatibility rule for CUE:

If an old package and a new package have the same import path, the new package must be backwards compatible with the old package.

Detailed Proposal

This section gives a brief overview of the proposal. Details are presented in the next section.

Throughout the rest of the proposal, the term "the loader" generally refers to the loading of modules and packages that happens through the use of cmd/cue or cue/load.

Rename cue get go to cue import go

In preparation for full module support in CUE, we will need to repurpose an existing cmd/cue command, renaming cue get go it to cue import go. The detail of this is covered in https://github.com/cuelang/cue/issues/646.

This change fits nicely with a recent change to the current semantics of cue get go. Prior to a6169255, cue get go attempted to resolve its package arguments that were not fully satisfied by go.{mod,sum} automatically, via the use of go/packages (which itself uses cmd/go). This would result in changes to go.{mod,sum}.

With a6169255 we have instead shifted to a model of requiring that a Go dependency of cue get go can be fully resolved via go.{mod,sum} without requiring further changes to either. This aligns with the new Go 1.16 default of cmd/go build commands assuming a read-only default.

This also aligns well with the new name cue import go: unlike cue get go, the command does not imply any fetching or resolution. cue import go will therefore fail in case any of its arguments cannot be fully resolved via the current go.{mod,sum}, and the user will need to run go get -d (or equivalent) to ensure full resolution is possible.

As part of this rename, cue import go will be changed to generate into the cue.mod/imp hierarchy: cue get go incorrectly generates files within the cue.mod/pkg hierarchy.

Declaring module dependencies

At the core of this proposal is the ability for a CUE module to declare dependencies on other CUE modules through versions. This will be done by an extended schema of cue.mod/module.cue files. Expanding our our example from earlier:

-- cue.mod/module.cue --
module: "example.com/blah/v2"

require: {
    "acme.com/quote": "v1.1.0"
}
-- cue.mod/sum.cue --
[
{ path: "acme.com/quote", what: "v1.1.0", sum: "h1:3LFP3629v+1aKXU5Q37mxmRxX/pIu1nijXydLShEq5I="},
{ path: "acme.com/quote", what: "v1.1.0/go.mod", sum: "h1:8Sl8LxpKi29FqWXR16WEFZRNSz3SoPzUzeMeY4+DwBQ=" },
{ path: "acme.com/quote", what: "v1.1.0/cue.mod/module.cue", sum: "h1:ed2f49a15b1743a9c1216ce5355698dc8a9f0b6aef44" },
]
-- go.mod --
module example.com/blah/v2

require (
    acme.com/quote v1.1.0
)
-- blah.cue --
package blah

import "acme.com/quote"

x: quote.Hello

Note:

Using cmd/cue to control/list module dependencies

Whilst it would be possible to maintain cue.mod/module.cue, cue.mod/sum.cue and go.mod by hand, it would be an incredibly frustrating and error-prone process. Instead, cmd/cue will be modified to support controlling and listing module dependencies, and updating the files that declare those dependencies.

For example, in the course of developing example.com/blah/v2 we might well have run:

cue get acme.com/quote

This resolves acme.com/quote to the latest version of a module that provides the package acme.com/quote and modifies cue.mod/module.cue, cue.mod/sum.cue and go.mod accordingly to record the dependency. cue get is therefore directly analogous to go get in terms of its role in dependency management. cue get will, like cmd/go, support various version queries to control the dependencies being added, removed, upgraded or downgraded. For example @latest is the version query implied when, as above, no @$version is specified.

The cue list command, directly analogous to go list, will be added to provide information about CUE modules and packages. For example we could request information about the acme.com/quote package in JSON format as follows:

{
        "Dir": "/home/cueckoo/cue/modcache/acme.com/quote@v1.1.0",
        "ImportPath": "acme.com/quote",
        "Name": "quote",
        "Doc": "So fun CUE-related quotes",
        "Root": "/home/cueckoo/cue/modcache/acme.com/quote@v1.1.0",
        "Module": {
                "Path": "acme.com/quote",
                "Version": "v1.1.0",
                "Time": "2018-05-06T08:33:45Z",
                "Dir": "/home/cueckoo/cue/modcache/acme.com/quote@v1.1.0",
                "GoMod": "/home/cueckoo/cue/modcache/cache/download/acme.com/quote/@v/v1.1.0.mod"
                "CUEMod": "/home/cueckoo/cue/modcache/cache/download/acme.com/quote/@v/v1.1.0.cue"
        },
        "CUEFiles": [
                "quote.cue"
        ],
        "Imports": [
                "string"
        ],
        "Deps": [
                "string",
        ]
}

The cue mod command will be expanded with subcommands for more fine-grained control of cue.mod/module.cue, cue.mod/sum.cue (and go.mod indirectly). For example:

$ cue mod edit --replace=acme.com/quote=/path/to/acme.com/quote

would add a replace "directive" to cue.mod/module.cue, which tells the loader to load all versions of acme.com/quote from the directory /path/to/acme.com/quote. The resulting cue.mod/module.cue and go.mod would then look like this:

-- cue.mod/module.cue --
module: "example.com/blah/v2"

require: {
    "acme.com/quote": "v1.1.0"
}

replace: {
    {mod: path: "acme.com/mymod", target: "/path/to/acme.com/mymod"},
}
-- go.mod --
module example.com/blah/v2

require (
    acme.com/quote v1.1.0
)

replace (
    acme.com/mymod => /path/to/acme.com/mymod
)

The --mod flag will, for all cmd/cue commands that load/resolve package import paths, control the behaviour of resolution. For example cue list --mod=vendor $pkg will limit the resolution of the package pattern $pkg to the contents of the cue.mod/pkg directory, matching the behaviour of today. The default will be --mod=readonly, with cue get being the exception because its whole purpose is to modify the main module's dependencies.

Using cue/load and the cuelang.org/go/... API

The Go project entirely abstracts the loading of Go package and module information within cmd/go. go/packages exists as a wrapper for cmd/go to load Go packages for inspection and analysis.

CUE has a dedicated package for loading CUE instances, cue/load. This package is used by cmd/cue and users of the cuelang.org/go/... API. As such, there is no need for a go/packages equivalent, at least not equivalent functionality that wraps cmd/cue.

Users of the cue/load package will benefit from seamless module support, with the cue/load.Config type being expanded with options relevant to controlling module resolution. The following example demonstrates how a custom proxy serving private modules would be set during the load process, with a specification that the sumdb should not be consulted for those private import paths:

package main

import (
    "fmt"

    "cuelang.org/go/cue"
    "cuelang.org/go/cue/load"
)

func main() {
    cfg := &load.Config{
        Proxy:   "https://mycompany.com;https://proxy.golang.org;direct",
        NoSumDB: "*.mycompany.com",
    }
    bps := load.Instance([]string{"mycompany.com/quote"}, cfg)
    is := cue.Build(bps)
    fmt.Printf("%!v(MISSING)\n", is[0].Value())

}

Use of a trusted module mirror and checksum database

In August 2019, the Go team at Google launched the Go module mirror and checksum database were launched. As part of the Go 1.13 launch, cmd/go used both by default. There are some important points to note about this setup:

The use of a checksum database is a key component of verified and verifiable builds, or evaluations in CUE terms, and hence trusted reproducible builds (evaluations).

As an interim measure, we propose using proxy.golang.org as a module mirror, and the checksum database sum.golang.org for authentication, until such time as the CUE project can host such services itself. Use of these services will be enabled by default for cmd/cue and cue/load, but entirely configurable, just like cmd/go. We have permission from the Go team for this proposed use of these services.

As covered above, this places one noticeable constraint on the CUE module implementation: that we must include a go.mod file at the root of a CUE module. This is required for our use of proxy.golang.org and sum.golang.org to satisfy the GOPROXY and sumdb protocols, but also crucially to indicate to the proxy (which runs cmd/go) that a repository is Go module-aware, as opposed to not (remember, we don't have such a consideration in CUE). This last point is significant in the case of multi-module repositories, which we intend to support in CUE.

Use of the proxy and checksum database will largely fade into the background for users of cmd/cue and cue/load.

For example, under a default configuration and assuming that acme.com/quote is public, the following command:

$ cue get acme.com/quote

would:

Using cue env to understand the current cmd/cue configuration

The cue env command will be added to show the current configuration to the user, for example:

CUEENV="/home/cueckoo/.config/go/env"
CUEFLAGS=""
CUEMODCACHE="/home/cueckoo/cue/modcache"
CUENOPROXY=""
CUENOSUMDB=""
CUEPRIVATE=""
CUEPROXY="https://proxy.golang.org,direct"
CUESUMDB="sum.golang.org"
CUETMPDIR=""
CUEVCS=""
CUEVERSION="v0.3.0-beta.6"
CUEMOD="/path/to/my/module"

Much like cmd/go, the setting of an explicit environment variable will have the highest precedence, else if a variable is not set then a configuration default can be set in the file located at cue env CUEENV using the cue env -w command. See the cmd/go environment variable documentation for more details.

Detailed design

Finding a module for a module path

The details in this section largely follow the pattern established with Go module and package resolution.

When the loader needs to resolve an import path to a package and/or module (the concept of a module proxy is fully introduced below), they start by locating the repository that contains the module.

If the module path has a VCS qualifier (one of .bzr, .fossil, .git, .hg, .svn) at the end of a path component, the loader will use everything up to that path qualifier as the repository URL. For example, for the module example.com/foo.git/bar, the loader will download the repository at example.com/foo.git using git, expecting to find the module in the bar subdirectory. The loader will guess the protocol to use based on the protocols supported by the version control tool.

If the module path does not have a qualifier, the loader sends an HTTP GET request to a URL derived from the module path with a ?cue-get=1 query string. For example, for the module acme.com/quote, the loader will send the following request:

https://acme.com/quote?cue-get=1

The loader follows redirects but otherwise ignores response status codes, so the server may respond with a 404 or any other error status. CUE will not support an equivalent of GOINSECURE.

The server must respond with an HTML document containing a <meta> tag in the document's <head>. The <meta> tag should appear early in the document to avoid confusing the loader's restricted parser. In particular, it should appear before any raw JavaScript or CSS. The <meta> tag must have the form:

<meta name="cue-import" content="root-path vcs repo-url">

root-path is the repository root path, the portion of the module path that corresponds to the repository's root directory. It must be a prefix or an exact match of the requested module path. If it's not an exact match, another request is made for the prefix to verify the <meta> tags match.

vcs is the version control system. It must be one of bzr, fossil, git, hg, svn, mod. The mod scheme instructs the loader to download the module from the given URL using the CUEPROXY protocol (see later). This allows developers to distribute modules without exposing source repositories.

repo-url is the repository's URL. If the URL does not include a scheme (either because the module path has a VCS qualifier or because the <meta> tag lacks a scheme), the loader will try each protocol supported by the version control system. For example, with Git, the loader will try https:// then git+ssh://. Insecure protocols (like http:// and git://) are not supported.

As an example, consider acme.com/quote again. The loader sends a request to https://acme.com/quote?cue-get=1. The server responds with an HTML document containing the tag:

<meta name="cue-import" content="acme.com/quote git https://github.com/acme.com/quote">

From this response, the loader will use the Git repository at the remote URL https://github.com/acme.com/quote.

As the CUE modules implementation is based heavily on the Go modules implementation, if a server fails to respond with an appropriate <meta> tag using the query ?cue-get=1, the loader will then attempt to fallback via a query ?go-get=1. This means that any server that needs to distinguish the hosting information for Go and CUE modules can do so, whilst not placing an undue burden on existing infrastructure to add support for ?cue-get=1 queries from day one (the assumption being that generally speaking hosting of CUE and Go modules will generally be aligned in terms of VCS systems). For example, GitHub and other popular hosting services respond to ?go-get=1 queries for all repositories, so no server reconfiguration is necessary for CUE modules hosted at those sites. Over time it is envisaged that such code hosting sites would add support for ?cue-get=1 queries.

After the repository URL is found, the loader will clone the repository into the module cache. In general, the loader tries to avoid fetching unneeded data from a repository. However, the actual commands used vary by version control system and may change over time. For Git, the loader can list most available versions without downloading commits. It will usually fetch commits without downloading ancestor commits, but doing so is sometimes necessary.

Versions of modules

Much like Go, a version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version. See the Go module reference for more detail, and Semantic Versioning 2.0.0 for details on how versions are formatted, interpreted, and compared.

Therefore, as we have seen in cue.mod/module.cue examples above, a module path and version together form the basis of a declared dependency.

Module authors explicitly release new versions by defining a semantic version tag within the repository that hosts the module, a tag that indicates which revision should be checked out for that version. For example, as the author of the example.com/blah/v2 module, we would, in a local clone of the repository behind example.com/blah/v2, do something like:

$ pwd
/path/to/example.com/blah
$ git log --oneline -1
7c5d28e6 (HEAD) deps: upgrade to latest acme.com/quote version
$ cue list -m 
example.com/blah/v2
$ git tag v2.0.1
$ git push origin v2.0.1

Another module looking to depend on example.com/blah/v2 would then be able to run:

$ cue get example.com/blah/v2@latest

and that would resolve to v2.0.1, specifically the revision 7c5d28e6.

CUE modules also adopt the concept of a pseudo-version. A pseudo-version is a specially formatted pre-release version that encodes information about a specific revision in a version control repository. For example, v0.0.0-20191109021931-daa7c04131f5 is a pseudo-version. Pseudo versions are used when canonical semantic tagged versions are not available, for example the user wanting to depend on a recent fix pushed to the main branch of a project. Pseudo-versions follow exactly the same model as implemented for Go modules: see the explanation of pseudo versions and details of how pseudo versions map to commits for more specific information.

Declaring dependencies on other modules

The earlier example of the main module example.com/blah/v2 showed a sketch of the schema of cue.mod/module.cue. This is now presented more fully:

#ModuleDef: {
    module: string

    // The cue directive sets the expected CUE version for the module
    cue: string

    // A require directive declares a minimum required version of a given module dependency
    require: [...#Require]

    // An exclude directive prevents a module version from being loaded by the loader.
    exclude: [string]: string

    // A replace directive replaces the contents of a specific version of a module, or all 
    // versions of a module, with contents found elsewhere.
    replace: [...#Replace]

    // A retract directive indicates that a version or range of versions of the module 
    // defined by go.mod should not be depended upon.
    retract: [...#Retract]
}

#Require: {
    path:     string
    version:  string
    indirect: bool  
}

#Replace: {
    old: #Module
    new: #Module
}

#Module: {
    path:    string
    version: string
}

#Retract: {
    low:       string
    high:      string
    rationale: string
}

With the obvious exception that the representation is different (in CUE a module is defined in CUE itself, whereas in Go go.mod files have their own syntax), each of the directives supported in go.mod files have corresponding fields and meanings in a cue.mod/module.cue file:

Building on the short descriptions in the schema above, the links above cover in more detail what each means and when they should be used.

To support the retract directive we will provide a builtin semver package that allows for the specification of ranges:

module: "example.com/blah"

import "semver"

retract: [
    "v1.1.0", 
    semver.GreaterThanEqual("v1.2.0") & semver.LessThan("v1.3.0"), 
]

Similarly, the cue.mod/sum.cue file would have the following schema:

#Sum: [...#SumEntry]

#SumEntry: {
    // path is the module path
    path: string 

    // what is the aspect of the module for which we have a cryptographic sum
    // e.g. "v1.1.0" means the sum represents the sum of the module itself, 
    // "v1.1.0/go.mod" means the sum refers to the go.mod file only
    what: string  

    // sum is the crytographic sum. The format of the sum is described in 
    // https://golang.org/ref/mod#go
    sum: string
}

Question: do we really need/want to have cue.mod/sum.cue in CUE format? Or would the go.sum format suffice?

The format of the go.mod file is described in the Go module reference documentation.

Module proxy

As discussed above, as an interim measure we propose using proxy.golang.org as a module mirror, and the checksum database sum.golang.org for authentication, until such time as the CUE project can host such services itself. Use of these services will be enabled by default in the loader. The following CUE environment variables will control use of the proxy and checksum database (this follows almost identically from the Go module environment variables):

CUENOPROXY

Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes that should always be fetched directly from version control repositories, not from module proxies.

If CUENOPROXY is not set, it defaults to CUEPRIVATE.

CUENOSUMDB

Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes for which the loader should not verify checksums using the checksum database.

If CUENOSUMDB is not set, it defaults to CUEPRIVATE.

CUEPRIVATE

Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes that should be considered private. CUEPRIVATE is a default value for CUENOPROXY and CUENOSUMDB. CUEPRIVATE also determines whether a module is considered private for CUEVCS (see below).

CUEPROXY

List of module proxy URLs, separated by commas (,) or pipes (|). When the loader looks up information about a module, it contacts each proxy in the list in sequence until it receives a successful response or a terminal error. A proxy may respond with a 404 (Not Found) or 410 (Gone) status to indicate the module is not available on that server.

The loader's error fallback behaviour is determined by the separator characters between URLs. If a proxy URL is followed by a comma, the loader falls back to the next URL after a 404 or 410 error; all other errors are considered terminal. If the proxy URL is followed by a pipe, the loader falls back to the next source after any error, including non-HTTP errors like timeouts.

CUEPROXY URLs may have the schemes https or file. If a URL has no scheme, https is assumed. A module cache may be used directly as a file proxy:

GOPROXY=file://$(cue env GOMODCACHE)/cache/download

Two keywords may be used in place of proxy URLs:

CUEPROXY defaults to https://proxy.golang.org,direct. Under that configuration, the loader first contacts the Go module mirror run by Google, then falls back to a direct connection if the mirror does not have the module. See https://proxy.golang.org/privacy for the mirror's privacy policy. The CYEPRIVATE and CUENOPROXY environment variables may be set to prevent specific modules from being downloaded using proxies.

CUESUMDB

Identifies the name of the checksum database to use and optionally its public key and URL. For example:

CUESUMDB="sum.golang.org"
CUESUMDB="sum.golang.org+<publickey>"
CUESUMDB="sum.golang.org+<publickey> https://sum.golang.org

The loader knows the public key of sum.golang.org and also that the name sum.golang.google.cn (available inside mainland China) connects to the sum.golang.org database; use of any other database requires giving the public key explicitly. The URL defaults to https:// followed by the database name.

CUESUMDB defaults to sum.golang.org, the Go checksum database run by Google. See https://sum.golang.org/privacy for the service's privacy policy.

If CUESUMDB is set to off the checksum database is not consulted, and all unrecognised modules are accepted, at the cost of giving up the security guarantee of verified repeatable downloads for all modules. A better way to bypass the checksum database for specific modules is to use the CUEPRIVATE or CUENOSUMDB environment variables.

Module versioning

(This follows directly from the Go module reference

Starting with major version 2, module paths must have a major version suffix like /v2 that matches the major version. For example, if a module has the path example.com/mod at v1.0.0, it must have the path example.com/mod/v2 at version v2.0.0.

Major version suffixes are not allowed at major versions v0 or v1. There is no need to change the module path between v0 and v1 because v0 versions are unstable and have no compatibility guarantee. Additionally, for most modules, v1 is backwards compatible with the last v0 version; a v1 version acts as a commitment to compatibility, rather than an indication of incompatible changes compared with v0.

Minimal version selection

It is proposed that, like Go, minimal version selection (MVS) will be used as the algorithm to select a set of module versions to use when evaluating packages. MVS is described in detail in Minimal Version Selection by Russ Cox. The detail of the algorithm is not covered here.

MVS operates on a directed graph of modules, specified with go.mod files. Each vertex in the graph represents a module version. Each edge represents a minimum required version of a dependency, specified using a require directive. replace and exclude directives in the main module's go.mod file modify the graph.

Resolving a package to a module

When the loader loads a package using a package path, it needs to determine which module provides the package. CUE will follow exactly the same model for resolving a package to a module in this respect, with the obvious substitution replacing "the go command" with "the loader", and GO* with CUE* environment module-related environment variables. Whilst CUE does not have the equivalent of GOOS or GOARCH, it will similarly ignore file level build constraints of the form @if during this resolution.

Changes to cmd/cue

This section introduces changes that will be made to cmd/cue. Most of these changes are effectively a front-end to changes that will be made to cue/load; those changes are discussed in the next section.

All cmd/cue commands that load information about packages will become module-aware:

The --mod flag is understood by module-aware commands and controls the resolution of packages in the following way (like cmd/go):

By default, if a cue.mod/pkg vendor directory is present at the module root, the loader acts as if --mod=vendor were used. Otherwise, the loader acts as if --mod=readonly were used.

Module-aware commands will also understand --modpath as a means of specifying an alternative path at which a cue.mod directory can be found (and correspondingly read cue.mod/module.cue and associated files from).

Like cmd/go, the --modcacherw flag instructs the loader to create new directories in the module cache with read-write permissions instead of making them read-only.

We now move on to talk in more details about changes to cmd/cue commands. For users of the cuelang.org/go/... APIs, programmatically creating and running command instances via cuelang.org/go/cmd/cue/cmd will remain possible and will be fully module-aware.

For any command that talks about modifying cue.mod/module.cue, it should be assumed an identical change will be made to the module go.mod file, unless specified otherwise. The same generally applies for any action of the loader that would modify any of these files.

Much of the proposal regarding cmd/cue commands is, unsurprisingly, heavily based on the cmd/go-equivalent commands.

cue get

cue get [-u] [packages]

cue get updates module dependencies in the cue.mod/module.cue file for the main module.

Unlike cmd/go we have no need to establish the -d flag because there is no concept of building/installing the module/package we have just fetched. Nor do we have a need for -t until cue test. Otherwise the command will behave like go get:

Once cue get has resolved its arguments to specific modules and versions, cue get will add, change, or remove require directives in the main module's cue.mod/module.cue file to ensure the modules remain at the desired versions in the future. Note that required versions in cue.mod/module.cue files are minimum versions and may be increased automatically as new dependencies are added. See Minimal version selection (MVS) for details on how versions are selected and conflicts are resolved by module-aware commands.

cue get then proceeds along the lines described in the go get documentation.

cue list

cue list [-f format] [-json] [-m] [list flags] [packages]

cue list lists the named packages, one per line. The most commonly-used flags are -f and -json, which control the form of the output printed for each package. Other list flags, documented below, control more specific details.

The default output shows the package import path:

$ cue list acme.com/quote
acme.com/quote

The --f flag specifies an alternate format for the list, using the syntax of package text/template. The default output is equivalent to --f '{{.ImportPath}}'. The struct being passed to the template is:

type Package struct {
    Dir           string   // directory containing package sources
    ImportPath    string   // import path of package in dir
    ImportComment string   // path in import comment on package statement
    Name          string   // package name
    Doc           string   // package documentation string
    Target        string   // install path
    Builtin       bool     // is this package a builtin?
    Module        *Module  // info about package's containing module, if any (can be nil)

    // Source files
    CUEFiles        []string   // .cue source files
    IgnoredCUEFiles []string   // .cue source files ignored due to build constraints

    // Dependency information
    Imports      []string          // import paths used by this package
    ImportMap    map[string]string // map from source import to ImportPath (identity entries omitted)
    Deps         []string          // all (recursively) imported dependencies

    // Error information
    Incomplete bool            // this package or a dependency has an error
    Error      *PackageError   // error loading package
    DepsErrors []*PackageError // errors loading dependencies
}

With error information defined as:

type PackageError struct {
    ImportStack   []string // shortest path from package named on command line to this one
    Pos           string   // position of error (if present, file:line:col)
    Err           string   // the error itself
}

The Module struct type is defined as below for cue list -m.

cue list -m

cue list -m [-u] [-retracted] [-versions] [list flags] [modules]

cue list -m lists information about CUE modules. The --m flag lists information about modules and not packages.

The --json flag prints JSON-encoded output according to the struct type:

type Module struct {
    Path      string       // module path
    Version   string       // module version
    Versions  []string     // available module versions (with -versions)
    Replace   *Module      // replaced by this module
    Time      *time.Time   // time version was created
    Update    *Module      // available update, if any (with -u)
    Indirect  bool         // is this module only an indirect dependency of main module?
    Dir       string       // directory holding files for this module, if any
    GoMod     string       // path to go.mod file for this module
    CUEVersion string       // go version used in module
    Error     *ModuleError // error loading module
}

type ModuleError struct {
    Err string // the error itself
}

As an alternative to --json, the --f flag specifies

--u adds information about available upgrades

The --versions flag causes list to set the module's Versions field to a list of all known versions of that module, ordered according to semantic versioning, lowest to highest.

--retracted flag instructs list to show retracted versions in the list printed with the -versions flag and to consider retracted versions when resolving version queries

cue mod download

cue mod download [-json] [-x] [modules]

The cue mod download command downloads the named modules into the module cache. Arguments can be module paths or module patterns selecting dependencies of the main module or version queries of the form path@version. With no arguments, download applies to all dependencies of the main module.

The loader will automatically download modules as needed during ordinary execution. The cue mod download command is useful mainly for pre-filling the module cache or for loading data to be served by a module proxy.

By default, download writes nothing to standard output. It prints progress messages and errors to standard error.

The --json flag causes download to print a sequence of JSON objects to standard output, describing each downloaded module (or failure), corresponding to this Go struct:

type Module struct {
    Path     string // module path
    Version  string // module version
    Error    string // error loading module
    Info     string // absolute path to cached .info file
    GoMod    string // absolute path to cached .mod file
    Zip      string // absolute path to cached .zip file
    Dir      string // absolute path to cached source root directory
    Sum      string // checksum for path, version (as in go.sum)
    GoModSum string // checksum for go.mod (as in go.sum)
}

The --x flag causes download to print the commands download executes to standard error.

cue mod edit

cue mod edit [editing flags] [-fmt|-print|-json] [go.mod]

Example:

# Add a replace directive.
$ cue mod edit -replace example.com/a@v1.0.0=./a

# Remove a replace directive.
$ cue mod edit -dropreplace example.com/a@v1.0.0

# Set the go version, add a requirement, and print the file
# instead of writing it to disk.
$ cue mod edit -go=1.14 -require=example.com/m@v1.0.0 -print

# Format the go.mod file.
$ cue mod edit -fmt

# Format and print a different .mod file.
$ cue mod edit -print tools.mod

# Print a JSON representation of the go.mod file.
$ cue mod edit -json

The cue mod edit command provides a command-line interface for editing and formatting cue.mod/module.cue files, for use primarily by tools and scripts. cue mod edit reads only one cue.mod/module.cue file; it does not look up information about other modules. By default, cue mod edit reads and writes the cue.mod/module.cue file of the main module, but a different target file can be specified after the editing flags. All changes to cue.mod/module.cue files made via cue mod edit will be reflected in a CUE module's go.mod file.

The editing flags specify a sequence of editing operations.

The editing flags may be repeated. The changes are applied in the order given.

cue mod graph

cue mod graph

The cue mod graph command prints the module requirement graph (with replacements applied) in text form. For example:

example.com/main example.com/a@v1.1.0
example.com/main example.com/b@v1.2.0
example.com/a@v1.1.0 example.com/b@v1.1.1
example.com/a@v1.1.0 example.com/c@v1.3.0
example.com/b@v1.1.0 example.com/c@v1.1.0
example.com/b@v1.2.0 example.com/c@v1.2.0

Each vertex in the module graph represents a specific version of a module. Each edge in the graph represents a requirement on a minimum version of a dependency.

cue mod graph prints the edges of the graph, one per line. Each line has two space-separated fields: a module version and one of its dependencies. Each module version is identified as a string of the form path@version. The main module has no @version suffix, since it has no version.

See Minimal version selection (MVS) for more information on how versions are chosen. See also cue list -m for printing selected versions and cue mod why for understanding why a module is needed.

cue mod init

cue mod init [name]

The cue mod init command initialises and writes a new cue.mod/module.cue file in the current directory, in effect creating a new module rooted at the current directory. The cue.mod directory must not already exist. A go.mod file will also be written to the current directory.

Per the current module docs, the use of a module is optional, but required if one wants to import files. The module name is required if a package within the module needs to import another package within the main module.

cue mod tidy

cue mod tidy [-e] [-v]

cue mod tidy ensures that the cue.mod/module.cue file (and by extension the go.mod file) matches the source code in the module. It adds any missing module requirements necessary to build the current module's packages and dependencies, and it removes requirements on modules that don't provide any relevant packages. It also adds any missing entries to cue.mod/sum.cue and removes unnecessary entries.

The -e flag causes cue mod tidy to attempt to proceed despite errors encountered while loading packages.

The -v flag causes cue mod tidy to print information about removed modules to standard error.

cue mod tidy works by loading all of the packages in the main module and all of the packages they import, recursively. cue mod tidy acts as if all build constraints are enabled, so it will consider @if constrained files even if those source files wouldn't normally be evaluated.

TODO: do we need the equivalent of the ignore build constraint exception?

Like the ... package pattern, cue mod tidy will not consider packages in the main module in directories named testdata or with names that start with . or _ unless those packages are explicitly imported by other packages.

Once cue mod tidy has loaded this set of packages, it ensures that each module that provides one or more packages either has a require directive in the main module's cue.mod/module.cue file or is required by another required module. cue mod tidy will add a requirement on the @latest version on each missing module. cue mod tidy will remove require directives for modules that don't provide any packages in the set described above.

cue mod tidy may also add or remove indirect fields on #Require directives. A #Require directive with indirect: true denotes a module that does not provide packages imported by packages in the main module. These requirements will be present if the module that imports packages in the indirect dependency has an incomplete cue.mod/module.cue file. They may also be present if the indirect dependency is required at a higher version than is implied by the module graph; this usually happens after running a command like cue get -u ./....

cue mod vendor

cue mod vendor [-e] [-v]

The cue mod vendor command constructs a directory named cue.mod/pkg that contains copies of all packages needed to support evaluations of packages in the main module. As with cue mod tidy and other module commands, build constraints (except for ignore ???) are not considered when constructing the cue.mod/pkg vendor directory.

When vendoring is enabled, the loader will load packages from the cue.mod/pkg vendor directory instead of downloading modules from their sources into the module cache and using packages those downloaded copies.

cue mod vendor also creates the file vendor/modules.txt that contains a list of vendored packages and the module versions they were copied from. When vendoring is enabled, this manifest is used as a source of module version information. When the cue command reads vendor/modules.txt, it checks that the module versions are consistent with cue.mod/module.cue (and go.mod). If either cue.mod/module.cue or go.mod changed since vendor/modules.txt was generated, cue mod vendor should be run again.

Note that cue mod vendor removes the cue.mod/pkg vendor directory if it exists before re-constructing it. Local changes should not be made to vendored packages. The cue command does not check that packages in the cue.mod/pkg vendor directory have not been modified, but one can verify the integrity of the cue.mod/pkg vendor directory by running cue mod vendor and checking that no changes were made.

The --e flag causes cue mod vendor to attempt to proceed despite errors encountered while loading packages.

The --v flag causes cue mod vendor to print the names of vendored modules and packages to standard error.

cue mod verify

cue mod verify

cue mod verify checks that dependencies of the main module stored in the module cache have not been modified since they were downloaded. To perform this check, cue mod verify hashes each downloaded module .zip file and extracted directory, then compares those hashes with a hash recorded when the module was first downloaded. cue mod verify checks each module in the evaluation list (which may be printed with cue list -m all.

If all the modules are unmodified, cue mod verify prints "all modules verified". Otherwise, it reports which modules have been changed and exits with a non-zero status.

Note that all module-aware commands verify that hashes in the main module's cue.mod/sum.cue file match hashes recorded for modules downloaded into the module cache. If a hash is missing from cue.mod/sum.cue (for example, because the module is being used for the first time), the loader verifies its hash using the checksum database (unless the module path is matched by CUEPRIVATE or CUENOSUMDB).

In contrast, cue mod verify checks that module .zip files and their extracted directories have hashes that match hashes recorded in the module cache when they were first downloaded. This is useful for detecting changes to files in the module cache after a module has been downloaded and verified. cue mod verify does not download content for modules not in the cache, and it does not use cue.mod/sum.cue files to verify module content. However, cue mod verify may download go.mod files in order to perform minimal version selection. It will use cue.mod/sum.cue to verify those files, and it may add cue.mod/sum.cue entries for missing hashes.

cue mod why

cue mod why [-m] [-vendor] packages...

cue mod why shows a shortest path in the import graph from the main module to each of the listed packages.

The output is a sequence of stanzas, one for each package or module named on the command line, separated by blank lines. Each stanza begins with a comment line starting with # giving the target package or module. Subsequent lines give a path through the import graph, one package per line. If the package or module is not referenced from the main module, the stanza will display a single parenthesised note indicating that fact.

The --m flag causes cue mod why to treat its arguments as a list of modules. cue mod why will print a path to any package in each of the modules. Note that even when --m is used, cue mod why queries the package graph, not the module graph printed by cue mod graph.

By default, cue mod why considers the graph of packages matched by the all pattern, which is the same set of packages matched by go mod vendor.

cue clean -modcache

cue clean [-modcache]

The --modcache flag causes cue clean to remove the entire module cache, including unpacked source code of versioned dependencies.

This is usually the best way to remove the module cache. By default, most files and directories in the module cache are read-only to prevent tests and editors from unintentionally changing files after they've been authenticated. Unfortunately, this causes commands like rm -r to fail, since files can't be removed without first making their parent directories writable.

The --modcacherw flag (accepted by module-aware commands) causes new directories in the module cache to be writable. To pass --modcacherw to all module-aware commands, add it to the GOFLAGS variable. GOFLAGS may be set in the environment or with cue env -w.

--modcacherw should be used with caution; developers should be careful not to make changes to files in the module cache. cue mod verify may be used to check that files in the cache match hashes in the main module's cue.mod/sum.cue file.

cue env

This is covered in the "Proposal" section above.

Required changes to cue/load

The main changes required in cue/load are additions to the cue/load.Config type. These are backwards compatible assuming that users of this type are, as advised by go vet, using keyed struct literals. All field additions below have corresponding "front end" flags/environment variables in cmd/cue.

type Config struct {
    // ***************
    // Existing fields
    // ***************

    Context *build.Context
    ModuleRoot string
    Module string
    Package string
    Dir string
    Tags []string
    AllCUEFiles bool
    BuildTags []string
    Tests bool
    Tools bool
    DataFiles bool
    StdRoot string
    ParseFile func(name string, src interface{}) (*ast.File, error)
    Overlay map[string]Source
    Stdin io.Reader

    // *************************
    // New module-related fields
    // *************************

    // Mod defines the module resolution mode. ModModeReadonly will be the 
    // default. 
    //
    // Corresponds to the --mod flag understood by module-aware commands.
    Mod ModMode

    // ModPath specifies an alternative path at which a cue.mod directory 
    // can be found (and correspondingly read cue.mod/module.cue and 
    // associated files from).
    //
    // Corresponds to the --modpath flag understood by module-aware commands.
    ModPath string

    // ModCacheRW instructs the loader to create new directories in the 
    // module cache with read-write permissions instead of making them read-only.
    //
    // Corresponds to the --modcacherw flag understood build module-aware commands.
    ModCacheRW bool

    // Proxy defines a list of module proxy URLs, separated by commas (`,`) or 
    // pipes (`|`). 
    Proxy string

    // NoProxy is a comma-separated list of glob patterns (in the syntax of 
    // Go's path.Match) of module path prefixes that should always be fetched 
    // directly from version control repositories, not from module proxies.
    NoProxy string

    // NoSumDB is a comma-separated list of glob patterns (in the syntax of Go's 
    // path.Match) of module path prefixes for which the go should not verify 
    // checksums using the checksum database.
    NoSumDB string

    // Private is a comma-separated list of glob patterns (in the syntax of Go's 
    // path.Match) of module path prefixes that should be considered private. 
    // Private is a default value for NoProxy and NoSumDB. Private also determines
    // whether a module is considered private for VCS.
    Private string

    // SumDB identifies the name of the checksum database to use and optionally 
    // its public key and URL
    SumDB string

    // VCS controls the set of version control tools the loader may use to 
    // download public and private modules (defined by whether their paths match a 
    // pattern in CUEPRIVATE) or other modules matching a glob pattern.
    VCS string
}

As is the case today, cmd/cue will use cue/load to load CUE instances, defaulting the values of modules-related cue/load.Config values from flags and environment variables (see "Environment variables").

For users of cue/load who want to mimic the behaviour of cmd/cue, a utility function that sets the modules-related fields of cue/load.Config to cmd/cue defaults (according to the values of environment variables and defaults) will be provided.

Version queries

We re-use the same concept of version queries as Go: https://golang.org/ref/mod#version-queries

A version query may be one of the following:

Release versions are preferred over pre-release versions. For example, if versions v1.2.2 and v1.2.3-pre are available, the latest query will select v1.2.2, even though v1.2.3-pre is higher. The <v1.2.4 query would also select v1.2.2, even though v1.2.3-pre is closer to v1.2.4. If no release or pre-release version is available, the latest, upgrade, and patch queries will select a pseudo-version for the commit at the tip of the repository's default branch. Other queries will report an error.

cmd/cue outside of a module context

Like today, cmd/cue will continue operate outside of a module context, and only fail if its arguments require resolution of non-builtins.

GOPROXY protocol

CUE module proxies will implement the GOPROXY protocol. Anyone looking to provide a CUE module proxy alternative to proxy.golang.org should consult the GOPROXY reference.

Version control systems

The loader may download module source code and metadata directly from a version control repository. Downloading a module from a proxy is usually faster, but connecting directly to a repository is necessary if a proxy is not available or if a module's repository is not accessible to a proxy (frequently true for private repositories). Git, Subversion, Mercurial, Bazaar, and Fossil are supported. A version control tool must be installed in a directory in PATH in order for the loader to use it.

To download specific modules from source repositories instead of a proxy, set the CUEPRIVATE or CUENOPROXY environment variables (or equivalent options in cue/load.Config). To configure the loader to download all modules directly from source repositories, set CUEPROXY to direct. See Environment variables for more information.

See https://golang.org/ref/mod#vcs for more details on the specifics of:

Controlling version control tools with CUEVCS

The loader's ability to download modules with version control commands like git is critical to the decentralized package ecosystem, in which code can be imported from any server. It is also a potential security problem if a malicious server finds a way to cause the invoked version control command to run unintended code.

The CUE module implementation will follow the same model of GOVCS to change the allowed version control systems for specific modules, via the variable CUEVCS. For example:

CUEVCS=github.com:git,evil.com:off,*:git|hg

With this setting, code with a module or import path beginning with github.com/ can only use git; paths on evil.com cannot use any version control command, and all other paths (* matches everything) can use only git or hg.

See the GOVCS reference documentation for more details.

Module zip files

Like Go, CUE module versions are distributed as .zip files. There is rarely any need to interact directly with these files, since the loader creates, downloads, and extracts them automatically from module proxies and version control repositories. However, it's still useful to know about these files to understand cross-platform compatibility constraints or when implementing a module proxy.

The cue mod download command downloads zip files for one or more modules, then extracts those files into the module cache. Depending on CUEPROXY and other environment variables, the loader may either download zip files from a proxy or clone source control repositories and create zip files from them. The --json flag may be used to find the location of download zip files and their extracted contents in the module cache.

CUE modules will be subject to the same constraints as Go modules with respect to file path and size constraints. See https://golang.org/ref/mod#zip-files for more details

Private modules

We establish the same support model for private modules as Go modules, with the following environment variables as substitutions:

The Go module documentation provides a complete set of scenarios covering the various permutations of public/private modules:

It also provides a comprehensive explanation of how the loader (cmd/go in the case of the Go modules implementation) handles privacy concerns with respect to proxy requests. See the Go modules Privacy section for a comprehensive explanation.

Modules cache

Like Go, CUE will establish and use a user module cache, a directory where the loader stores downloaded module files. The default location of the module cache is $HOME/cue/modcache. To use a different location, set the CUEMODCACHE environment variable.

The cache may be shared by multiple CUE projects developed on the same machine. The loader will use the same cache regardless of the location of the main module. Multiple instances of the loader may safely access the same module cache at the same time.

For more detail on the module cache implementation, see the Go module cache reference.

Authenticating modules

The approach and implementation of authenticating modules follows exactly from the cmd/go implementation, and uses the checksum databse sum.golang.org by default for public modules.

When the loader downloads a module zip file or go.mod file (which is why this file is necessary for compatibility with the Go proxy and checksum models) into the module cache, it computes a cryptographic hash and compares it with a known value to verify the file hasn't changed since it was first downloaded. The loader reports a security error if a downloaded file does not have the correct hash.

For go.mod files, the loader computes the hash from the file content. For module zip files, the loader computes the hash from the names and contents of files within the archive in a deterministic order. The hash is not affected by file order, compression, alignment, and other metadata. See golang.org/x/mod/sumdb/dirhash for hash implementation details.

The loader compares each hash with the corresponding entry in the main module's cue.mod/sum.cue file. If the hash is different from the hash in cue.mod/sum.cue, the loader reports a security error and deletes the downloaded file without adding it into the module cache.

If the cue.mod/sum.cue file is not present, or if it doesn't contain a hash for the downloaded file, the loader may verify the hash using the checksum database, a global source of hashes for publicly available modules. Once the hash is verified, the loader adds it to cue.mod/sum.cue and adds the downloaded file in the module cache. If a module is private (matched by the CUEPRIVATE or CUENOSUMDB environment variables) or if the checksum database is disabled (by setting CUESUMDB=off), the loader accepts the hash and adds the file to the module cache without verifying it.

The module cache is usually shared by all CUE projects on a system, and each module may have its own cue.mod/sum.cue file with potentially different hashes. To avoid the need to trust other modules, the loader verifies hashes using the main module's cue.mod/sum.cue whenever it accesses a file in the module cache. Zip file hashes are expensive to compute, so the loader checks pre-computed hashes stored alongside zip files instead of re-hashing the files. The cue mod verify command may be used to check that zip files and extracted directories have not been modified since they were added to the module cache.

The format of cue.mod/sum.cue files is described above, and follows directly from the go.sum structure (albeit a different format). For details on the go.sum for and checksum databases, see the corresponding sections in the Go modules reference.

The cue.mod directory

Currently (the CUE world prior to this proposal) the contents of the cue.mod directory have the following function/semantics:

.
└── cue.mod
    ├── module.cue   - the module declaration
    ├── gen          - search path for CUE generated from third-party packages
    ├── pkg          - search path for third-party imports
    └── usr          - search path for user-maintained code to complement third-party packages

Given a non-main module import path acme.com/quote, the loader unifies the contents of the package values cue.mod/{gen,pkg,usr}/acme.com/quote.

Under this proposal, the only change to these semantics is that cue.mod/pkg becomes the equivalent of Go modules' vendor directory:

.
└── cue.mod
    ├── module.cue   - the module declaration
    ├── gen          - search path for CUE generated from third-party packages via cue generate
    ├── imp          - search path for CUE imported from third-party packages via cue import
    ├── pkg          - vendor for third-party imports
    └── usr          - search path for user-maintained code to complement third-party packages

If cue.mod/pkg exists, then the loader will expect to be able to load all non-main module imports from there - this also retains compatibility for the existing loading mechanism, i.e. when no go.mod file exists in the CUE module root. Given a non-main module import path acme.com/quote, the loader will unify the contents of the package values cue.mod/{gen,pkg,usr}/acme.com/quote as it does today.

If cue.mod/pkg does not exist, then the loader will resolve and load package dependencies from the module cache. Given a non-main module import path acme.com/quote, the loader unifies the contents of the package values cue.mod/{gen,usr}/acme.com/quote with, for example, $(cue env CUEMODCACHE)/acme.com/quote@v1.1.0.

Environment variables

Explanation of the flags and environment variables (and config options in the case of cue/load) that control the loader's behaviour are covered elsewhere in this proposal. A full list is provided here for reference:

The mapping from these flags and environment variables to cue/load.Config options is covered in "Required changes to cue/load".

Go modules and CUE modules coexisting

It is not unreasonable to imagine a CUE module and Go modules co-existing at the same path, sharing the same VCS repository. Indeed, with the native support for exporting CUE to Go, and importing CUE from Go, it seems very likely that these situations will arise. This proposal supports such a setup.

The versioning of both modules would be intrinsically linked by virtue of each module system sharing the same tagging scheme in the same repository. However, assuming a Go module and CUE module exist at the same root, it would not be possible to version the two separately within the same repository. On the assumption that co-existence of CUE and Go code implies a strong relationship between the two, with breaking changes in one almost certainly corresponding to breaking changes in the other, we don't foresee this being a problem.

Versioning the two modules separately would still be possible, but only in separate VCS repositories. This would be achieved by having the go-import and cue-import meta tags return different repository locations. See "Finding a module for a module path" for more information.

Where a Go and CUE module coexist in the same repository, there would be some redundancy insofar as a CUE module would contain Go code, and vice versa. We don't foresee any specific problems beyond the limited inefficiency in CPU, memory and storage terms of this redundancy.

The post-go.mod future

As indicated in the summary of this proposal, use of proxy.golang.org and sum.golang.org is intended as an interim measure until such time as the CUE project can host such services itself. The requirement for a go.mod file in a CUE module is tied to our use of those services.

When the CUE project can host such services itself, we will need to develop a CUEPROXY protocol, similar to the GOPROXY protocol, and hosting a service that speaks that protocol. We would then look to create a number of CUE releases where cmd/cue knows how to speak this protocol, at which point projects would be able to start the process of removing go.mod files from their CUE modules. The only challenge here being that a project would only be able to remove a go.mod if it could be sure there are no consumers who rely on using a cmd/cue version that does not speak the new protocol. However, adopting something akin to the Go release policy would be sensible in this respect: we would only advise go.mod files be removed when there are two major versions of CUE that support the new CUEPROXY protocol.

However, in the context of the previous section - "Go modules and CUE modules co-existing" - splitting out CUE dependencies from Go dependencies does appear to create an issue. Consider the following example, in a post-go.mod world, where CUE dependencies are not added to the go.mod file, and a go.mod file is not required as part of a CUE module:

-- cue.mod/module.cue --
module: "example.com/blah"

require: {
    "acme.com/quote": "v1.1.0"
}
-- go.mod --
module example.com/blah

require (
    acme.com/quote v1.2.0
    other.com/blah v1.5.0
)
-- blah.cue --
package blah

import "acme.com/quote"

x: quote.Hello
-- blah.go --
package main

import (
    "fmt"

    "acme.com/quote"
)

func main() {
    fmt.Println(quote.Hello)
}

(we elide go.sum and cue.mod/sum.cue files for simplicity).

Notes:

The problem is that the version of acme.com/quote resolved in go.mod is not guaranteed to be the same version of acme.com/quote resolved in cue.mod/module.cue. We essentially have two different instances of MVS running with different constraints.

Much of the time this might be totally innocuous. But version skew like this will almost certainly lead to issues, for example in the case of additive changes to an API.

Nor is this problem unique to the combination of Go and CUE. It is envisaged that languages other than Go will be supported via cue import and cue export. Each language will have its own (or indeed many) package versioning system equivalent. This scenario of a coincident Language X and CUE main module (or equivalent in Language X's terms) depending on coincident Language X and CUE modules is therefore more widespread: the version resolution algorithm of Language X is not guaranteed to arrive at the same revision as the CUE MVS algorithm.

Notable details/exceptions

Here is a general list of points that don't naturally fit under any other heading:

CUE & A

Why have you chosen to use {proxy,sum}.golang.org?

The Go module specification and implementation do not depend upon either proxy.golang.org or sum.golang.org. Indeed the GOPROXY and checksum protocols provide the necessary abstraction. Therefore the next phase of CUE module support does not need to be tied to either the module mirror at proxy.golang.org or checksum database at sum.golang.org. However, there are some significant advantages to doing so:

These seem to outweigh the disadvantages:

The most crucial point however is that the default of using proxy.golang.org and sum.golang.org within the loader is just that: a default. It is entirely possible to turn off use of both via CUEPROXY=off and CUESUMDB=off.

What about cue test?

As is covered elsewhere in this proposal, we have not concluded a design for cue test (this falls under https://github.com/cuelang/cue/issues/209). However, this proposal is entirely compatible with being extended to support _test.cue files, again following the pattern of _test.go files in Go.

What about a //go:embed CUE equivalent?

As of Go 1.16, cmd/go now supports including static files and file trees as part of the final executable, using the new //go:embed directive. See the documentation for the new embed package for details.

We are working on a proposal for how to support the concept of embedding in CUE. Much like cue test, this proposal is entirely compatible with (and indeed depends upon) this proposal, the next phase of modules.

What about a gorelease equivalent?

gorelease is an experimental tool that helps module authors avoid common problems before releasing a new version of a module.

Examples:

# Compare with the latest version and suggest a new version.
gorelease

# Compare with a specific version and suggest a new version.
gorelease -base=v1.2.3

# Compare with the latest version and check a specific new version for compatibility.
gorelease -version=v1.3.0

# Compare with a specific version and check a specific new version for compatibility.
gorelease -base=v1.2.3 -version=v1.3.0

gorelease analyzes changes in the public API and dependencies of the main module. It compares a base version with the currently checked out revision. Given a proposed version to release, gorelease reports whether the changes are consistent with semantic versioning.

Given the very nature of the problems that CUE looks to solve, it will be entirely possible to provide such a command to help CUE module authors. Much like gorelease is intended to become go release, the equivalent in the CUE world would likely be spelled cue release.

Will CUE packages start to pollute pkg.go.dev?

Whilst we have permission to use proxy.golang.org and sum.golang.org from the Go team until such time as the CUE project starts to host instances of a module mirror and checksum database itself, we should look to limit any unintended side effects. One such side effect would that pkg.go.dev (a Go module and package documentation and discovery site) uses index.golang.org (an index which serves a feed of new module versions that become available at proxy.golang.org). CUE modules would therefore start to "leak" into pkg.go.dev results. We will work with the pkg.go.dev team to ensure that the relevant heuristics for determining CUE-only modules.

How does this proposal relate to io/fs.FS?

Go 1.16 introduce a new io/fs package that defines the fs.FS interface, an abstraction for read-only trees of files. This package largely exists to support the new //go:embed feature, but does have other uses.

https://github.com/cuelang/cue/issues/607 raises the question of how the existing cue/load.Config.Overlay field might be used to supply an entire file system as input to cue/load. https://github.com/cuelang/cue/issues/607#issuecomment-797878959 clarifies that the current Overlay field exists to complement the the operating system file system, rather than replace it. As outlined in that comment, however, adding an io/fs.FS field to cue/load.Config would allow the intended semantics. This modules proposal is fully compatible with this proposal, but necessarily orthogonal to it.

What about verifiable evaluations?

One of the main differences between Go and CUE from a "build" perspective is that Go has complete control over build artefacts. Specifically, binaries that represent the compilation result of a main package. cmd/go includes sufficient module-related information (module path, version and checksum) in those binary artefacts so as to enable verifiable builds. runtime/debug.ReadBuildInfo() gives runtime access to that information; go version -m /path/to/binary allows it to be inspected. See Russ Cox's blog post for more information.

CUE does not have such control over its many output formats (JSON, Yaml, JSONSchema etc). An immediate consideration here would be that comments be used to encode similar module-related information. However, JSON for one does not support comments.

The Required fields and related issues proposal includes a section on cue export, and how that would be repurposed to be the inverse of cue import. The example presented there is as follows.

Given the CUE file:

a: 2 + 3
baz: {
    @export("baz.json")
    b: a
}
bar: {
    @export("jsonschema:/foo/bar/bar.json")
    string
}

cue export would then produce the following txtar output:

// File
import baz ":/foo/bar/baz.json”
import bar "jsonschema:/foo/bar/bar.json”

a: 5
"baz": { baz, @export("baz.json") }
"bar": { bar, @export("jsonschema:bar.yaml") }
-- baz.json --
b: 5
-- bar.yaml --
type: string

One option therefore would be to make a step towards verifiable evaluations by including sufficient module-related information (like that included Go binaries) in the txtar output of cue export.

Are there any alternatives to requiring a go.mod?

If we want to utilise and leverage the existing module mirror and checksum database, we don't see a way around the requirement of declaring a go.mod file in the root of a CUE module. The GOPROXY protocol requires that a regular go.mod file (i.e. a symlink is not sufficient) denote the root of a Go module, and that that file declares the module's requirements, retractions etc.

As noted above however, this is an interim measure until such time as the CUE project can host such services itself.

What about anonymous modules? Will they need a go.mod?

Anonymous modules can be created today via:

cue mod init

This creates a cue.mod/module.cue file as follows:

module: ""

Anonymous modules are useful because as an end user, i.e. a situation where you know a package within the module will never be a dependency of another module, coming up with a module name is an annoying problem.

Go modules do not support anonymous modules: every module must have a path. Hence we simply could not maintain a parallel go.mod file, matching the requirements listed in cue.mod/module.cue.

Therefore, for anonymous modules, cmd/cue and cue/load will not create or maintain a go.mod file.

Why is the module cache in the user's home directory?

Russ Cox provided excellent motivation for this decision in a GitHub discussion about the GOMODCACHE environment variable:

The module cache ($GOPATH/pkg/mod, defaulting to $HOME/go/pkg/mod) is for storing downloaded source code, so that every build does not redownload the same code and does not require the network or the original code to be available. The module cache holds entries that are like "if you need to download mymodule@v1.2.3, here are the files you'd get." If the answer is not in the cache, you have to go out to the network. Maybe you don't have a network right now. Maybe the code has been deleted. It's not anywhere near guaranteed that you can redownload the sources and also get the same result. Hopefully you can, but it's not an absolute certainty like for the build cache. (The go.sum file will detect if you get a different answer on re-download, but knowing you got the wrong bits doesn't help you make progress on actually building your code. Also these paths end up in file-line information in binaries, so they show up in stack traces, and the like and feed into tools like text editors or debuggers that don't necessarily know how to trigger the right cache refresh.)

I expect there are cron jobs or other tools that clean $HOME/.cache periodically. If part of the build cache got deleted, it would be no big deal, so it's fine to store the build cache there. But if downloaded source code got deleted unasked, I think that would potentially be quite surprising and problematic in various ways. That's why we store the source code in $GOPATH/pkg/mod, to keep it away from more expendable data.

What is the timeline for CUE modules?

Once the CUE community has had a chance to consider and respond to this proposal, if there is broad agreement with the direction implementation would start immediately. As mentioned elsewhere in this proposal, we hope to reuse much of the cmd/go/internal/... implementation, as well as learning from and leveraging the vast experience of the Go team.

Coming up with a rough timeline and priority ordered list of work will be the first thing we do when starting work on this next phase of CUE module support.

Are submodules and multi-module repositories supported?

Yes. Although the same advice regarding both will exist in the CUE world. As a starting point the following advice from Russ Cox will likely hold true for the vast majority:

For all but power users, you probably want to adopt the usual convention that one repo = one module. It's important for long-term evolution of code storage options that a repo can contain multiple modules, but it's almost certainly not something you want to do by default.

For more details see https://github.com/golang/go/wiki/Modules#faqs--multi-module-repositories

What strategies exist for supporting multiple major versions of a module in parallel?

A corollary of the import compatibility rule:

If an old package and a new package have the same import path, the new package must be backwards compatible with the old package

is that any breaking changes in a module with major version number >=1 must be accompanied by an increase in major version number. This raises the question of how to support users of the now old major version - is it possible to support both at the same time?

Like with Go modules, developers will have two options when it comes to maintaining multiple major versions of a module in parallel:

See the Go modules wiki entry on the topic as well as a mention in the Go modules reference.

Should the CUE version tags be namespaced?

Some of the early discussions about vgo (the Go modules prototype) questioned whether Go should distinguish the VCS used tags used to indicated versions of Go modules. The thinking being that v1.1.0 says nothing about the fact the tag corresponds to a version of a Go module. Alternatives included namespacing those tags, e.g. go:v1.1.0.

Link to those discussions

It is natural and appropriate to consider the same question in the context of designing CUE package versioning and modules.

However, if we choose to base our implementation on Go modules, using proxy.golang.org and sum.golang.org, then by definition we adopt the same approach to creating module versions: tagging with semantic versions, e.g. v1.1.0. Tags indicating Go versions and CUE versions will therefore be indistinguishable.

But as we cover under "Go modules and CUE modules coexisting", we don't see a problem with this "conflict" - indeed it very much aligns with the intentions of the author.

What about CUE that coexists with non-Go module aware Go code?

jbcpollak commented 2 years ago

to be clear, I'd like to be able to do something like cue vet $pkg@$version my-json-data.json and have it vet the json against the module, regardless of the current working directory.

verdverm commented 2 years ago

Modules will have a local cache, much like Golang does, so it should work in any directory. I would expect that removing the need to be in an existing module is a matter of code paths and arg parsing. This seems to be inline with go run $pkg@version which ignores the local go.mod for the duration of that call.

If the package argument has a version suffix (like @latest or @v1.0.0),
"go run" builds the program in module-aware mode, ignoring the go.mod file in
the current directory or any parent directory, if there is one. This is useful
for running programs without affecting the dependencies of the main module.
helderco commented 2 years ago

@myitcv What’s the current status on this proposal? At Dagger we want to prioritize package management since it’s one of the most requested features. How can we help?

BradleyChatha commented 1 year ago

:wave: Just doing another checkup. I'd really love to expand the usage of Cue within the company I work for, however having to brew up a manual install script to get our shared packages setup just right is not going to be viable in the long term.

A builtin solution would make Cue so much more frictionless to use :)

myitcv commented 1 year ago

We have created a revised package management and modules proposal that is linked to and discussed at https://github.com/cue-lang/cue/discussions/2330.