Open kumorid opened 2 years ago
To understand where CUE will land with dependency management and modules, you can read up on how Go does this. CUE's module system will be very close to that. Note #851 (which will be replaced by a newer proposal hopefully soon)
I have written a very minimal MVS tool which you can use for CUE modules in the meantime. (hof mod
https://cuetorials.com/first-steps/modules-and-packages/#dependency-management)
Well, thanks @verdverm , I am aware of the discussion around dependency management, but I have no visibility on the status of that subject right now.
Your link did not address the problem I tried to point out in my post.
We do have a need right now to have a way of encapsulating and distributing cue modules, as that is the way we define our artifacts.
We would love to use whatever approach CUE would have, but it does not have it yet, and I am not privy to the discussion around this subject.
What I tried to explain, perhaps not too successfully, in my original post is the status the loader is in today, which makes it difficult (impossible?) without additional tooling to properly organize a dependency management scheme for CUE modules.
This is due to the decision not completely explained in the doc, that the loader makes, when loading an import to pay NO ATTENTION whatsoever to the module structure of that import, picking secondary imports (imports from imports) from the pkg/gen/usr dirs of the top level module...
What's more, imports from you own module are dealt with similarly (going to the top level module context), as the loader pays no attention to the cue.mod structure of the deep module... Very quirky and unworkable. I am not going to extend myself on the negative effects this has for organizing your cue code for distribution.
We are finishing our own tool that will work with the current loader to allow flexible imports maintaining dependent module encapsulation to arbitrary depths of import. However, to achieve our goals, we need to perform alterations on the cue files themselves when they are installed to be used by other modules.
This is insatisfactory, and we would like to align with whatever the strategy is in CUE. We would also like to have the chance to influence how dependency management is going to finally operate.
BTW, I have some experience around the subject of dependency management and would love to contribute to the discussion, wherever that discussion is happening right now, as we are fully invested in CUE.
the loader pays no attention to the cue.mod structure of the deep module... Very quirky and unworkable. I am not going to extend myself on the negative effects this has for organizing your cue code for distribution.
This is by design. Much has been said in the opposite direction, so there is a difference of opinion here. Go flattens dependency trees using the Minimum Version Selection algorithm. (see the go.dev link). There are many issues with having multiple versions of the same dependency in a program.
I would add that CUE introduces an extra dimension of dependency complexity. Consider a .NET module which uses CUE for defining its schema. How do we ensure that the fetched versions for .NET and CUE are the same? Do they always need to be? What if I am using CUE only for my CI system (i.e. completely separate from the language of my application)? I'm looking forward to the discussion around this issue in the next version of the pkg mgmt proposal. Personally, I'm advocating for linting over CUE managing the dependencies in other language modules.
we would like to align with whatever the strategy is in CUE
I would be very surprised if CUE deviated from the Go model and chose the deep nesting. CUE draws from Go significantly. I currently use hof mod
to align with what I understand to be the solution CUE is aiming for. It was largely pulled out of the Go source tree, which I believe is the plan for actually implementing cue mod
as well.
The discussion have not been so much privy
as it is being around when it is discussed. If I recall correctly, cue mod
was mentioned in the first Town Hall video. #851 is the best place to comment while we wait for the second draft for dep mgmt. It sometimes comes up in Slack as well, which is a good place to generally chat about CUE.
@kumorid @verdverm As Tony indicated, this behavior is intentional. Granted, there is too much context missing in documentation to help understand the rational behind this. So let me clarify a bit.
The behavior mimics that of Go modules. Of course, in an imperative language like Go it is obvious how mixing multiple versions of the same import within one program can lead to big problems.
For CUE, this may not seem so obvious at first, but the problems still exist. Suppose we have
flowchart TD
A -- imports --> B
A -- imports --> C
B -- imports --> C
Both A and B have fields in structs that are of type C.#Def
. Now, suppose that the C imported by A and B can be of different versions, as you suggested. In this case, it is very well possible that the set of fields of C.#Def
is different for the two versions. Suppose Package B has a top-level declaration bar: C.#Def
. Allowing C
to be of different versions could break the following:
package A
import "path/to/B"
import "path/to/C"
foo: C.#Def
foo: B.bar
because closedness rules would error out if C.#Def
were to have different field sets for the two different versions. If we ever attribute some kind of identity to definitions (which seems very useful for the query extension), things get even worse.
Of course sharing the same version can also cause issues, but we think this is generally less serious. This is especially true if we assume certain properties between versions. In Go, the assumption is that newer versions of any package with the same import path are backwards compatible (modulo bug fixes). If a new version of a package makes backwards incompatible changes it is assumed to be published under a new import path.
In Go this is by convention. In CUE this is actually enforceable. A package that is backwards compatible with another package satisfies the subsumption relation, disregarding closedness. Consider what this means in the above example. The way package management is envisioned to work, package A is forced to use a version of C equal or newer to the one used by B. Suppose package A imports package C' while package B imports C. Also assume that package Cc
is defined such that C&Cc
equals C'
. In other words, Cc
represents the constraints on the older version of C
so that it becomes equal to C'
.
So basically, in a world where importing followed your approach, the current semantics would be equivalent to adding the constraints ofCc
to the cue.mod/usr/path/to/C
directory of B
.
In general APIs can be expected to evolve and therefore be open to extension. It is also really bad practice to implicitly make backwards incompatible changes to APIs.
Definitions in CUE seem to defy that principle, by closing structs to disallow additional fields. The mechanism of closedness was solely added to allow for catching typos, however. It is still assumed that APIs are evolving and will be extended. The closedness merely applies to the current version of a package. This is part of the reason why the concept of closedness is so intertwined with the concept of a definition, which actually has several functions.
The easiest analogy is perhaps that of Protobuffers. A server that receives a proto message is assumed to ignore any unknown fields. It is assumed that in practice different APIs will exist concurrently in practice and servers will just have to deal with that.
However, when one compiles a Protobuf definition to a typed language like Go or C, looking up an undefined field will result in a compiler error.
A CUE "program" is expected to be typed for only one version of an API. The proposed downcast operator (#454) can mimic the equivalent of receiving a message of different types.
It is important to see closedness in this light when thinking of defining package management in CUE.
Granted, the way external packages are currently defined (vendored into cue.mod
) makes this very unintuitive. Also, a few things are currently clearly broken:
1) there is currently no enforcement that the version of A
is newer than the one used in B
(the user needs to do this manually).
2) at the very least the constraints added in cue.mod/usr
should be added to at least the version used in B
, if not the one used by all.
3) all this won't work if backwards compatibility is not observed.
Of course one can think of many alternatives, and we are open to suggestions (now is the time). For instance, CUE could be smarter than it is now and recognize that in the code snippet above the C.#Def
of A
is really a newer version the C.#Def
that constraints B.bar
and disable the closedness check and thus preventing the breakage discussed above. We would have to closely consider what other breakage scenarios could occur, though. It would probably still be good to enforce that "downstream" packages use newer versions.
This could even be carried through further by allowing packages to define transformations from previous versions that CUE could automatically apply. This would allow backwards incompatible changes.
I haven't really thought about the implications of these alternatives, though, but it may be a worthwhile exercise in coming to a conclusion on this.
What I would be open to do now is to allow for cue.mod/module.cue
to define an option to switch between these two behaviors of package importing to see what kind of trouble people run into with either. This can potentially give us more data to make the right design choices in this matter.
For the record, here's a testscript
example that illustrates @mpvl's reasons for not wanting to use different versions of the same package in the same build:
cd cue.mod/pkg/foo.example/b
exec cue vet -c
cd $WORK
exec cue vet -c
-- cue.mod/pkg/foo.example/a/a-v1.cue --
// This is version 1 of the foo.example/a package.
// This is the version used by foo.example/c, but
// foo.example/b uses a later version that it has vendored directly
// for itself.
package a
#Foo: {
a: int
}
-- cue.mod/pkg/foo.example/a/cue.mod/module.cue --
module: "foo.example/a"
-- cue.mod/pkg/foo.example/b/b.cue --
package b
import "foo.example/a"
#Foo: a.#Foo & {
b: "blah"
}
-- cue.mod/pkg/foo.example/b/cue.mod/module.cue --
module: "foo.example/b"
-- cue.mod/pkg/foo.example/b/cue.mod/pkg/foo.example/a/cue.mod/module.cue --
module: "foo.example/a"
-- cue.mod/pkg/foo.example/b/cue.mod/pkg/foo.example/a/a-v2.cue --
package a
#Foo: {
a: int
b: string
}
-- cue.mod/module.cue --
module: "foo.example/c"
-- c.cue --
package c
import (
"foo.example/a"
"foo.example/b"
)
// These should all be valid, because we know that
// b.#Foo "is" also a.#Foo.
x: a.#Foo
x: b.#Foo
x: a: 1234
This fails for me with the following output:
> cd cue.mod/pkg/foo.example/b
$WORK/cue.mod/pkg/foo.example/b
> exec cue vet -c
> cd $WORK
$WORK
> exec cue vet -c
[stderr]
x: field not allowed: b:
./c.cue:9:4
./c.cue:10:4
./c.cue:11:4
./cue.mod/pkg/foo.example/a/a-v1.cue:7:7
./cue.mod/pkg/foo.example/b/b.cue:4:7
./cue.mod/pkg/foo.example/b/b.cue:5:2
[exit status 1]
FAIL: /tmp/testscript117421585/x.txtar/script.txt:5: unexpected command failure
error running /tmp/x.txtar in /tmp/testscript117421585/x.txtar
That is, foo.example/b
vets fine on its own, but not when imported in a module context along with an earlier version of foo.example/a
.
The intent behind importing packages in CUE seems to be to import packages within some module, be it the module in which the importing package is, or some other module.
The mechanism the loader uses to import from outside the current module is to interpret the path of the import statement as a path within the
cue.mod/pkg
folder (for simplicity we ignore thegen
andusr
subfolders).In the context of redistributable modules, this means that importing a foreign module implies placing it within the
cue.mod/pkg
dir of the importing module.Proceeding recursively, for example, if module A imports module B that, in turn, imports module C, module A will expect module B to be within
A/cue.mod/pkg/B, and module B would expect module C to be within
B/cue.mod/pkg/C(finally, within
A/cue.mod/pkg/B/cue.mod/pkg/C.The CUE loader, however behaves differently. Instead of looking for C (for B) within
B/cue.mod/pkg/C
it looks for it inA/cue.mod/pkg/C
.OK, a package manager could somehow accomodate this behavir with some sort of flattened module storing strategy. However, this apparently innocent behavior would have, as a consequence, the following:
Clearly, this behavior breaks the principle of encapsulation, and cannot be easily solved unless the loader behavior is changed.
If the loader honored what seems to be implied by the spec about modules/packages, it would look at the
pkg
dir of nested dependencies instead of looking at thepkg
of further up directories.Alternatively, the loader could be able to understand configuration about dependencies (maybe in the
cue.mod/module.cue
file) to perform adequate translations from module dependencies to particular module versions. This could be made work easily.The way it stands now makes it very cumbersome to implement any sort of module redistribution with encapsulation with the desirable reproducible effects.
After looking at the go code for the loader (go is not even my second language ☺) it does not seem that difficult to achieve the desirable behavior. One only needs to make sure that the context of import resolution is not the context you started from in the first package to be processed... contexts should be nested.
In my opinion this would be a big improvement towards implementing any sort of workable redistribution scheme of CUE modules (package management), whatever the version resolution mechanism or name resolution strategy/protocol/mechanism happened to be agreed upon in the (near?) future.