cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.16k stars 297 forks source link

Imports break encapsulation #1525

Open kumorid opened 2 years ago

kumorid commented 2 years ago

The intent behind importing packages in CUE seems to be to import packages within some module, be it the module in which the importing package is, or some other module.

The mechanism the loader uses to import from outside the current module is to interpret the path of the import statement as a path within the cue.mod/pkg folder (for simplicity we ignore the gen and usr subfolders).

In the context of redistributable modules, this means that importing a foreign module implies placing it within the cue.mod/pkg dir of the importing module.

Proceeding recursively, for example, if module A imports module B that, in turn, imports module C, module A will expect module B to be within A/cue.mod/pkg/B, and module B would expect module C to be withinB/cue.mod/pkg/C(finally, withinA/cue.mod/pkg/B/cue.mod/pkg/C.

The CUE loader, however behaves differently. Instead of looking for C (for B) within B/cue.mod/pkg/Cit looks for it in A/cue.mod/pkg/C.

OK, a package manager could somehow accomodate this behavir with some sort of flattened module storing strategy. However, this apparently innocent behavior would have, as a consequence, the following:

Clearly, this behavior breaks the principle of encapsulation, and cannot be easily solved unless the loader behavior is changed.

If the loader honored what seems to be implied by the spec about modules/packages, it would look at the pkg dir of nested dependencies instead of looking at the pkg of further up directories.

Alternatively, the loader could be able to understand configuration about dependencies (maybe in the cue.mod/module.cue file) to perform adequate translations from module dependencies to particular module versions. This could be made work easily.

The way it stands now makes it very cumbersome to implement any sort of module redistribution with encapsulation with the desirable reproducible effects.

After looking at the go code for the loader (go is not even my second language ☺) it does not seem that difficult to achieve the desirable behavior. One only needs to make sure that the context of import resolution is not the context you started from in the first package to be processed... contexts should be nested.

In my opinion this would be a big improvement towards implementing any sort of workable redistribution scheme of CUE modules (package management), whatever the version resolution mechanism or name resolution strategy/protocol/mechanism happened to be agreed upon in the (near?) future.

verdverm commented 2 years ago

To understand where CUE will land with dependency management and modules, you can read up on how Go does this. CUE's module system will be very close to that. Note #851 (which will be replaced by a newer proposal hopefully soon)

https://go.dev/ref/mod

I have written a very minimal MVS tool which you can use for CUE modules in the meantime. (hof mod https://cuetorials.com/first-steps/modules-and-packages/#dependency-management)

kumorid commented 2 years ago

Well, thanks @verdverm , I am aware of the discussion around dependency management, but I have no visibility on the status of that subject right now.

Your link did not address the problem I tried to point out in my post.

We do have a need right now to have a way of encapsulating and distributing cue modules, as that is the way we define our artifacts.

We would love to use whatever approach CUE would have, but it does not have it yet, and I am not privy to the discussion around this subject.

What I tried to explain, perhaps not too successfully, in my original post is the status the loader is in today, which makes it difficult (impossible?) without additional tooling to properly organize a dependency management scheme for CUE modules.

This is due to the decision not completely explained in the doc, that the loader makes, when loading an import to pay NO ATTENTION whatsoever to the module structure of that import, picking secondary imports (imports from imports) from the pkg/gen/usr dirs of the top level module...

What's more, imports from you own module are dealt with similarly (going to the top level module context), as the loader pays no attention to the cue.mod structure of the deep module... Very quirky and unworkable. I am not going to extend myself on the negative effects this has for organizing your cue code for distribution.

We are finishing our own tool that will work with the current loader to allow flexible imports maintaining dependent module encapsulation to arbitrary depths of import. However, to achieve our goals, we need to perform alterations on the cue files themselves when they are installed to be used by other modules.

This is insatisfactory, and we would like to align with whatever the strategy is in CUE. We would also like to have the chance to influence how dependency management is going to finally operate.

BTW, I have some experience around the subject of dependency management and would love to contribute to the discussion, wherever that discussion is happening right now, as we are fully invested in CUE.

verdverm commented 2 years ago

the loader pays no attention to the cue.mod structure of the deep module... Very quirky and unworkable. I am not going to extend myself on the negative effects this has for organizing your cue code for distribution.

This is by design. Much has been said in the opposite direction, so there is a difference of opinion here. Go flattens dependency trees using the Minimum Version Selection algorithm. (see the go.dev link). There are many issues with having multiple versions of the same dependency in a program.

I would add that CUE introduces an extra dimension of dependency complexity. Consider a .NET module which uses CUE for defining its schema. How do we ensure that the fetched versions for .NET and CUE are the same? Do they always need to be? What if I am using CUE only for my CI system (i.e. completely separate from the language of my application)? I'm looking forward to the discussion around this issue in the next version of the pkg mgmt proposal. Personally, I'm advocating for linting over CUE managing the dependencies in other language modules.

we would like to align with whatever the strategy is in CUE

I would be very surprised if CUE deviated from the Go model and chose the deep nesting. CUE draws from Go significantly. I currently use hof mod to align with what I understand to be the solution CUE is aiming for. It was largely pulled out of the Go source tree, which I believe is the plan for actually implementing cue mod as well.

The discussion have not been so much privy as it is being around when it is discussed. If I recall correctly, cue mod was mentioned in the first Town Hall video. #851 is the best place to comment while we wait for the second draft for dep mgmt. It sometimes comes up in Slack as well, which is a good place to generally chat about CUE.

mpvl commented 2 years ago

@kumorid @verdverm As Tony indicated, this behavior is intentional. Granted, there is too much context missing in documentation to help understand the rational behind this. So let me clarify a bit.

The behavior mimics that of Go modules. Of course, in an imperative language like Go it is obvious how mixing multiple versions of the same import within one program can lead to big problems.

For CUE, this may not seem so obvious at first, but the problems still exist. Suppose we have

flowchart TD
A -- imports --> B
A -- imports --> C
B -- imports --> C

Both A and B have fields in structs that are of type C.#Def. Now, suppose that the C imported by A and B can be of different versions, as you suggested. In this case, it is very well possible that the set of fields of C.#Def is different for the two versions. Suppose Package B has a top-level declaration bar: C.#Def. Allowing C to be of different versions could break the following:

package A
import "path/to/B"
import "path/to/C"

foo:  C.#Def
foo: B.bar

because closedness rules would error out if C.#Def were to have different field sets for the two different versions. If we ever attribute some kind of identity to definitions (which seems very useful for the query extension), things get even worse.

Why should uniquing a version work?

Of course sharing the same version can also cause issues, but we think this is generally less serious. This is especially true if we assume certain properties between versions. In Go, the assumption is that newer versions of any package with the same import path are backwards compatible (modulo bug fixes). If a new version of a package makes backwards incompatible changes it is assumed to be published under a new import path.

In Go this is by convention. In CUE this is actually enforceable. A package that is backwards compatible with another package satisfies the subsumption relation, disregarding closedness. Consider what this means in the above example. The way package management is envisioned to work, package A is forced to use a version of C equal or newer to the one used by B. Suppose package A imports package C' while package B imports C. Also assume that package Cc is defined such that C&Cc equals C'. In other words, Cc represents the constraints on the older version of C so that it becomes equal to C'.

So basically, in a world where importing followed your approach, the current semantics would be equivalent to adding the constraints ofCc to the cue.mod/usr/path/to/C directory of B.

Wait, your talking about growing field sets, so what is closedness for?

In general APIs can be expected to evolve and therefore be open to extension. It is also really bad practice to implicitly make backwards incompatible changes to APIs.

Definitions in CUE seem to defy that principle, by closing structs to disallow additional fields. The mechanism of closedness was solely added to allow for catching typos, however. It is still assumed that APIs are evolving and will be extended. The closedness merely applies to the current version of a package. This is part of the reason why the concept of closedness is so intertwined with the concept of a definition, which actually has several functions.

The easiest analogy is perhaps that of Protobuffers. A server that receives a proto message is assumed to ignore any unknown fields. It is assumed that in practice different APIs will exist concurrently in practice and servers will just have to deal with that.

However, when one compiles a Protobuf definition to a typed language like Go or C, looking up an undefined field will result in a compiler error.

A CUE "program" is expected to be typed for only one version of an API. The proposed downcast operator (#454) can mimic the equivalent of receiving a message of different types.

It is important to see closedness in this light when thinking of defining package management in CUE.

A note on encapsulation

Granted, the way external packages are currently defined (vendored into cue.mod) makes this very unintuitive. Also, a few things are currently clearly broken: 1) there is currently no enforcement that the version of A is newer than the one used in B (the user needs to do this manually). 2) at the very least the constraints added in cue.mod/usr should be added to at least the version used in B, if not the one used by all. 3) all this won't work if backwards compatibility is not observed.

Alternatives

Of course one can think of many alternatives, and we are open to suggestions (now is the time). For instance, CUE could be smarter than it is now and recognize that in the code snippet above the C.#Def of A is really a newer version the C.#Def that constraints B.bar and disable the closedness check and thus preventing the breakage discussed above. We would have to closely consider what other breakage scenarios could occur, though. It would probably still be good to enforce that "downstream" packages use newer versions.

This could even be carried through further by allowing packages to define transformations from previous versions that CUE could automatically apply. This would allow backwards incompatible changes.

I haven't really thought about the implications of these alternatives, though, but it may be a worthwhile exercise in coming to a conclusion on this.

What I would be open to do now is to allow for cue.mod/module.cue to define an option to switch between these two behaviors of package importing to see what kind of trouble people run into with either. This can potentially give us more data to make the right design choices in this matter.

rogpeppe commented 2 years ago

For the record, here's a testscript example that illustrates @mpvl's reasons for not wanting to use different versions of the same package in the same build:

cd cue.mod/pkg/foo.example/b
exec cue vet -c

cd $WORK
exec cue vet -c

-- cue.mod/pkg/foo.example/a/a-v1.cue --
// This is version 1 of the foo.example/a package.
// This is the version used by foo.example/c, but
// foo.example/b uses a later version that it has vendored directly
// for itself.
package a

#Foo: {
    a: int
}

-- cue.mod/pkg/foo.example/a/cue.mod/module.cue --
module: "foo.example/a"

-- cue.mod/pkg/foo.example/b/b.cue --
package b
import "foo.example/a"

#Foo: a.#Foo & {
    b: "blah"
}

-- cue.mod/pkg/foo.example/b/cue.mod/module.cue --
module: "foo.example/b"

-- cue.mod/pkg/foo.example/b/cue.mod/pkg/foo.example/a/cue.mod/module.cue --
module: "foo.example/a"

-- cue.mod/pkg/foo.example/b/cue.mod/pkg/foo.example/a/a-v2.cue  --
package a

#Foo: {
    a: int
    b: string
}

-- cue.mod/module.cue --
module: "foo.example/c"

-- c.cue --
package c

import (
    "foo.example/a"
    "foo.example/b"
)
// These should all be valid, because we know that
// b.#Foo "is" also a.#Foo.
x: a.#Foo
x: b.#Foo
x: a: 1234

This fails for me with the following output:

> cd cue.mod/pkg/foo.example/b
$WORK/cue.mod/pkg/foo.example/b
> exec cue vet -c
> cd $WORK
$WORK
> exec cue vet -c
[stderr]
x: field not allowed: b:
    ./c.cue:9:4
    ./c.cue:10:4
    ./c.cue:11:4
    ./cue.mod/pkg/foo.example/a/a-v1.cue:7:7
    ./cue.mod/pkg/foo.example/b/b.cue:4:7
    ./cue.mod/pkg/foo.example/b/b.cue:5:2
[exit status 1]
FAIL: /tmp/testscript117421585/x.txtar/script.txt:5: unexpected command failure
error running /tmp/x.txtar in /tmp/testscript117421585/x.txtar

That is, foo.example/b vets fine on its own, but not when imported in a module context along with an earlier version of foo.example/a.