cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.04k stars 287 forks source link

cue: native support for associative lists #14

Open cueckoo opened 3 years ago

cueckoo commented 3 years ago

Originally opened by @mpvl in https://github.com/cuelang/cue/issues/14

kubectl allows patching using a "strategic merge". In short, it allows lists to be treated like maps so that the right elements can be merged.

In general, unification of lists is very tedious for automated configuration. A possible solution is for users to define mappings manually. Although CUE can handle reconstructing and merging unknown API, it is still tedious. Ideally users would just use the native API of whatever system they work on.

Also, ideally, mappings should be able to be applied with the same separation of concerns as is possible with normal objects.

One possible approach would be to extend the emit mechanisms for inner objects:

myList: [ ...{ name: string } ]

// before evaluation, the list is converted to a map like this
myList: {
    <- "\(x.name)" : x for x in $ // $ means self or current object
    // after evaluation, it is converted back to this format.
    -> [ x for x in $ ]
}

The top-level emit is then -> Expr where the -> can be elided as the top-level value is always a map.

This needs a lot more thought, but having some kind of mechanism like this can be important.

There may also be overlap with a design for field attributes.

cueckoo commented 3 years ago

Original reply by @enisoc in https://github.com/cuelang/cue/issues/14#issuecomment-463315069

Thanks for starting to brainstorm on this! Overall, this first idea for an approach seems powerful enough at first glance that I can imagine using it to implement k8s strategic merge semantics for lists.

To give you some flavor of the "rabbit hole" of additional complexity I mentioned in the email thread, the next most interesting case after x.name is where the "primary key" (in the relational sense) consists of multiple "columns". For example, we have a list of "port specs" where the primary key is a tuple of (protocol,portnum) where protocol can be "tcp" or "udp".

It seems like your first proposed approach would work for this too if we do something like:

portList: {
    <- "\(x.protocol + ':' + tostr(x.portnum))" : x for x in $
    -> [ x for x in $ ] // same as for single-column PK
}

It's not the most obvious thing, but for most users it would be hidden inside the generated CUE library for k8s API objects and they wouldn't have to think about it.

The next crazy thing down the rabbit hole is probably the idea that sometimes it's important to preserve the relative order of items in lists that get merged as if they were maps. I'm not sure what CUE currently guarantees, if anything, about the emitted order of fields.

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/14#issuecomment-485084942

@enisoc: thanks for these insights.

Regarding guarantees of ordering: currently the ordering is based on the order of appearance of a field within the language. This is simple and generally gives nice results, but it does sometimes result in some unfortunate reordering. What I was thinking of supporting for maps is to do topological sorts so that the relative ordering of the elements as they appear in the map is preserved. The main idea behind this is to have nicer output, I hadn't thought of it in terms of guarantees and what that will mean for the semantics. It may be okay to guarantee that a merge of two lists guarantees a certain ordering as long as no cycles are introduced, without introducing this concept in the value lattice. That seems fishy theoretically, but may be a practical conclusion.

I've tried many different approaches for the annotation of strategic merges in the mean time. The main are quite unsatisfactory. One notation I'm investigating that has some promise is to give additional constraints to lists. For instance:

[...string]{3}     // a list of strings of length 3
[...string]{<=10}  // a list of strings of at most length 10.

Similarly, we could introduce additional constraints in terms of a strategic merge interpretation, something like

a: [...v1.Object]::{"\(strings.ToCamel(<-kind))" "\(<-metadata.name)": _}
or
a: [...v1.Object]{[strings.ToCamel(<-kind)] [<-metadata.name]}

or whatever notation. This would tell cue that a list encountered at field a should be interpreted as a strategic merge. The <- operator would access the element, allowing to refer to the element values for which to construct the key.

If only kubernetes objects were specified at the top-level, mixing in additional constraints would be easy. For instance

service <Name>: v1.Service

would then further restrict object kinds of type Service accordingly.

If this is not the case, and we want CUE to mimic a json object stream natively, one could perhaps write

[...v1.Object]{[strings.ToCamel(<-kind)] [<-metadata.name]: (v1&v1beta1)[<-kind]}

where additional constraints for the elements are selected from one of the respective packages. See the recent addition of cue get go for understanding generating CUE templates from Kubernetes code.

This need a lot more though. This means we can now represent as a map or list. In raw mode, one may want to represent it as a map, for evaluated output a list. The topological sort approach needs working out. Also, this may not break associativity, commutativity or idempotence. This means we need to introduce something in lists similar to how integer literals work and that the exact type can't be evaluated until all information is available for a field. This could be fine, though.

So a lot of potential issues, but it is worth it. Strategic merges are not only common in Kubernetes, but also graph unification is not great for handling lists, and I'm sure this issue is not limited to Kubernetes.

cueckoo commented 3 years ago

Original reply by @enisoc in https://github.com/cuelang/cue/issues/14#issuecomment-556980772

The new "bulk optional fields" syntax got me thinking about this associative list problem again.

If I understand the new syntax correctly, it seems like, inside a struct context, [exprA]: exprB means, "Unify exprB with the value of any field whose field name when unified with exprA is not _|_."

I was thinking that a generalization of this inside a list context might allow me to specify the kinds of constraints I want to apply onto associative lists (specifically thinking of those in many k8s APIs).

The proposed rules would be something like this, where exprA and exprB are both structs:

  1. When encountered inside a list context, [exprA]: exprB means, "Unify exprA & exprB with any element in the list whose value when unified with exprA is not _|_."
  2. When encountered inside a list context, exprA: exprB means, "Unify exprA & exprB with any element in the list whose value when unified with exprA is not _|_ (same as rule 1 so far). In addition, if no existing elements can be unified with exprA, append exprA & exprB as a new element."

I see this as being sort of analogous to setting an element of a map, except the map happens to be structured as an associative list. Since the elements of an associative list are structs, the "key" (exprA) is also a struct in this case. That even lets you define multiple fields, which creates a multi-column primary key (something I mentioned earlier we'd need for k8s APIs).

Some examples:

// Define a container somewhere.
containers: [
  {name: "mycontainer"}: {
    image: "us.gcr.io/my-registry/my-image"
    command: "foo"
  }
]
// Somewhere else, apply constraints on the container by name.
containers: [
  {name: "mycontainer"}: {
    // Apply a constraint on the image for a container.
    image: =~"^us\.gcr\.io/"
    // Apply a constraint (e.g. give it a consistent name) on a particular
    // port, if it exists. There could be a udp port 443 as well, but we
    // won't touch that because we use a multi-column primary key.
    ports: [
      [{protocol:"tcp",containerPort:443}]: {
        name: "https"
      }
    ]
  }
]
// Add another container to the associative list,
// while keeping existing elements (mycontainer).
containers: [
  {name: "othercontainer"}: {
    image: "otherimage"
  }
]
cueckoo commented 3 years ago

Original reply by @extemporalgenome in https://github.com/cuelang/cue/issues/14#issuecomment-774164540

Ideally users would just use the native API of whatever system they work on.

One of the implications of having a better language/tool, like CUE, to manage configuration is that you can of course reduce data duplication. You might use the same CUE data in Kubernetes, Terraform, and local JSON config outputs. In these cases, there often isn't a single native API or data format to target, or even when there is just one format, it can be overly complex and obscure the meaning of the data.

If CUE had generic data mapping/transformation capabilities, perhaps the ideal would then be to shift technology concerns (like Kubernetes) out of the core CUE code and into isolated, output-oriented packages?

cueckoo commented 3 years ago

Original reply by @jlongtine in https://github.com/cuelang/cue/issues/14#issuecomment-774404125

Kevin, I generally agree with this sentiment, but also think that having the ability to create associative arrays would be super helpful in a number of places. I know I have some use cases with CloudFormation (Tags are currently a huge pain, because you can't merge closed arrays), and Kubernetes as well. As a result, I think this particular feature probably does need direct language support.

Friedrich Nietzsche – "Freedom is the will to be responsible to ourselves."

On Fri, Feb 05, 2021 at 10:11 AM, Kevin Gillette < notifications@github.com > wrote:

Ideally users would just use the native API of whatever system they work on.

One of the implications of having a better language/tool, like CUE, to manage configuration is that you can of course reduce data duplication. You might have CUE data that is used in Kubernetes, Terraform, and local JSON configs. In these cases, there often isn't a single native API or data format to target, or even when there is just one format, it can be overly complex and obscure the meaning of the data.

If CUE had generic data mapping/transformation capabilities, perhaps the ideal would then be to shift technology concerns (like Kubernetes) out of the core CUE code and into isolated, output-oriented packages?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub ( https://github.com/cuelang/cue/issues/14#issuecomment-774164540 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAAFMONANEQNGOJLOKRZKADS5QRCXANCNFSM4GT2QJXQ ).

myitcv commented 3 years ago

For anyone following this who hasn't yet seen the syntax proposed here, please see https://github.com/cue-lang/cue/issues/165#associative-lists