cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.02k stars 287 forks source link

cue/cmd: `import crd` for importing Kubernetes CustomResourceDefinitions #2691

Open justenstall opened 10 months ago

justenstall commented 10 months ago

Is your feature request related to a problem? Please describe.

CUE cannot import Kubernetes CustomResourceDefinitions (CRDs). Importing CRDs into CUE would be a huge value add for Kubernetes users.

Describe the solution you'd like

I would like a cue import crd command that is able to import a YAML or JSON encoded CustomResourceDefinition to CUE.

Describe alternatives you've considered

cue import openapi

Since CRDs use a variation of OpenAPIv3 to define their schema, cue import openapi can be used to import the schema with some extra help from a query tool like yq. Here is an example script to extract a valid OpenAPI definition from the CRD with yq and then import it as CUE:

CRD="<file>.yaml"

yq '{"openapi": "3.0.0"} *
    {"info": {"title": .spec.group}} *
    {"info": {"version": .spec.versions[0].name}} *
    {"components": { "schemas": { .spec.names.kind: .spec.versions[0].schema.openAPIV3Schema}}}' \
    "$CRD_FILE" > crd-openapi.yaml 

cue import openapi crd-openapi.yaml

This is not a viable solution because of Kubernetes' OpenAPIv3 extensions explained in the Additional Context section below.

cue get go

Many CRDs are created with a code generation tool such as controller-gen (from the kubebuilder SDK), which generates the CRD from annotated Go code. I have used cue get go to import the CRD from the Go types, but it is not a 100% match for the controller-gen-created schema. Adding direct support for CRD --> CUE is ideal.

See also: #2679

Additional context

OpenAPIv3 vs Structural Schema

Kubernetes calls its flavor of OpenAPIv3 "structural schema", which is defined as follows:

A structural schema is an OpenAPI v3.0 validation schema which:

  1. specifies a non-empty type (via type in OpenAPI) for the root, for each specified field of an object node (via properties or additionalProperties in OpenAPI) and for each item in an array node (via items in OpenAPI), with the exception of:
    • a node with x-kubernetes-int-or-string: true
    • a node with x-kubernetes-preserve-unknown-fields: true
  2. for each field in an object and each item in an array which is specified within any of allOf, anyOf, oneOf or not, the schema also specifies the field/item outside of those logical junctors (compare example 1 and 2).
  3. does not set description, type, default, additionalProperties, nullable within an allOf, anyOf, oneOf or not, with the exception of the two pattern for x-kubernetes-int-or-string: true (see below).
  4. if metadata is specified, then only restrictions on metadata.name and metadata.generateName are allowed.

Source: Specifying a structural schema

CEL Validation

As seen in the snippet above, Kubernetes extends the OpenAPIv3 syntax with fields prefixed by x-kubernetes-. The most challenging to incorporate will be the newer x-kubernetes-validations field, used to define validations rules with the Common Expression Language (CEL). Translating CEL to CUE will be a big lift, but it is being cemented as the desired way for k8s developers to define validation rules in CRDs, k8s admission controller webhooks, and eventually even the native Kubernetes APIs. CEL validation will be incompatible with CUE in some cases, because it allows validating an updated Kubernetes object by comparing it to its existing values. Ideally, validation rules that have no CUE equivalent can be preserved so the CRD can round-trip CRD --> CUE --> CRD without losing information.

References:

Possible existing implementation?

The file encoding/openapi/crd.go contains a seemingly unused implementation of CRD decoding. If anyone is familiar with this code and knows what state it is in, I would appreciate the help as a new contributor.

myitcv commented 10 months ago

Thanks for raising this @justenstall

This caused me to think about https://twitter.com/stefanprodan/status/1710632837362147563

On that basis I'll copy in @stefanprodan and @sdboyer

stefanprodan commented 10 months ago

Some of the features described here are currently implemented in timoni mod vendor crd. We currently can generate CUE schemas from CRD manifests made with controller-gen and kubebuilder markers, which cover most CNCF projects. We handle x-kubernetes-preserve-unknown-fields which was quite challenging, the cue openapi package makes everything opened by default, instead only fields with this prop must be so, Sam found a way to deal with using kin-openapi. For CRDs that have anyOf and oneOf (luckily these are not supported by controller-gen so not many CRDs have them), the schema generated by cue is invalid (tracked in #2686). Next we'll have to deal with x-kubernetes-int-or-string.

As for CEL, I agree that is a major undertake but I suspect most CRDs will switch to CEL in 2024. Kubernetes Gateway API already uses CEL, and we're also considering using CEL for FluxCD, guess many more project will follow. Given this, without a CEL to CUE translator, I think the value prop of using CUE for Kubernetes objects validation will be less attractive. To overcome this, I'm inclined into running CEL inside Timoni after CUE generates the final values, in the same way the Kubernetes API does, but it would be ideal for CUE to provide a CEL translator, at least for the expressions that are compatible.

justenstall commented 10 months ago

Linking the following in case they could be of use:

justenstall commented 10 months ago

I put together a rough demo for this functionality using the Timoni CRD importer as a starting point. By switching from kin-openapi to Kubernetes' internal representation of a CRD it should hopefully be easier to work with their API extensions. The only additional functionality in the demo is that all Kubernetes extensions to OpenAPI (x-kubernetes-* fields) are preserved in the resulting CUE as attributes.

I think this sort of implementation would be the best way to make CRD imports lossless and would allow the CRD importing functionality to be made available independent of a CEL to CUE translator, which could theoretically be added at a later date.

https://github.com/justenstall/cue/blob/get-crd/encoding/crd/decode.go

Example

For extensions.istio.io/v1alpha1.WasmPlugin, the field spec.url uses x-kubernetes-validations:

url:
  description: URL of a Wasm module or OCI container.
  minLength: 1
  type: string
  x-kubernetes-validations:
  - message: url must have schema one of [http, https, file, oci]
    rule: 'isURL(self) ? (url(self).getScheme() in ['''', ''http'',
      ''https'', ''oci'', ''file'']) : (isURL(''http://'' + self) &&
      url(''http://'' +self).getScheme() in ['''', ''http'', ''https'',
      ''oci'', ''file''])'
```yaml apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: ... spec: group: extensions.istio.io names: kind: WasmPlugin ... versions: - name: v1alpha1 ... schema: openAPIV3Schema: properties: spec: properties: ... url: description: URL of a Wasm module or OCI container. minLength: 1 type: string x-kubernetes-validations: - message: url must have schema one of [http, https, file, oci] rule: 'isURL(self) ? (url(self).getScheme() in ['''', ''http'', ''https'', ''oci'', ''file'']) : (isURL(''http://'' + self) && url(''http://'' +self).getScheme() in ['''', ''http'', ''https'', ''oci'', ''file''])' ... ```

This is the resulting CUE in cue.mod/gen/extensions.istio.io/wasmplugin/v1alpha1/types_gen.cue:

url: strings.MinRunes(1) @crd(validations="""
                              [{"rule":"isURL(self) ? (url(self).getScheme() in ['', 'http', 'https', 'oci', 'file']) : (isURL('http://' + self) \u0026\u0026 url('http://' +self).getScheme() in ['', 'http', 'https', 'oci', 'file'])","message":"url must have schema one of [http, https, file, oci]"}]
                              """)
```cue package v1alpha1 import ( "strings" "list" ) #WasmPlugin: { // Extend the functionality provided by the Istio proxy through // WebAssembly filters. See more details at: // https://istio.io/docs/reference/config/proxy_extensions/wasm-plugin.html spec!: #WasmPluginSpec apiVersion: "extensions.istio.io/v1alpha1" kind: "WasmPlugin" metadata!: { ... } } // Extend the functionality provided by the Istio proxy through // WebAssembly filters. See more details at: // https://istio.io/docs/reference/config/proxy_extensions/wasm-plugin.html #WasmPluginSpec: { ... // URL of a Wasm module or OCI container. url: strings.MinRunes(1) @crd(validations=""" [{"rule":"isURL(self) ? (url(self).getScheme() in ['', 'http', 'https', 'oci', 'file']) : (isURL('http://' + self) \u0026\u0026 url('http://' +self).getScheme() in ['', 'http', 'https', 'oci', 'file'])","message":"url must have schema one of [http, https, file, oci]"}] """) ... } ```
stefanprodan commented 10 months ago

@justenstall this looks promising, I would like to give this a try in Timoni and run the CEL rules using k8s.io/apiserver/pkg/apis/cel.

PS. It would nice to preserve the original license headers when you copy/paste code from other projects.

justenstall commented 10 months ago

@stefanprodan

@justenstall this looks promising, I would like to give this a try in Timoni and run the CEL rules using k8s.io/apiserver/pkg/apis/cel.

Is the thought that Timoni would run the CEL rules during timoni mod vet/timoni bundle vet?

PS. It would nice to preserve the original license headers when you copy/paste code from other projects.

My bad on the license headers, I'll fix that.

stefanprodan commented 10 months ago

Is the thought that Timoni would run the CEL rules during timoni mod vet/timoni bundle vet?

I think is the only viable alternative to a CEL to CUE translator.

By switching from kin-openapi to Kubernetes' internal representation of a CRD it should hopefully be easier to work with their API extensions. T

I can't speak for the CUE team, personally I would not want Kubernetes as a dependency in CUE lang. This means the Kubernetes project can't use CUE as it would create a cyclic dependency, it also means that all the Kubernetes tools like Timoni would need to wait for CUE to bump the Kubernetes packages before they can upgrade. For example, Kubernetes 1.18 comes with a breaking change to their OpenAPI package, so all tools require a step upgrade. If your tool imports some package that depends on Kubernetes 1.17, you are stuck on 1.17 until all deps move to 1.18. This is the case with Flux, we are stuck on 1.17 for months now.

justenstall commented 10 months ago

By switching from kin-openapi to Kubernetes' internal representation of a CRD it should hopefully be easier to work with their API extensions. T

I can't speak for the CUE team, personally I would not want Kubernetes as a dependency in CUE lang. This means the Kubernetes project can't use CUE as it would create a cyclic dependency, it also means that all the Kubernetes tools like Timoni would need to wait for CUE to bump the Kubernetes packages before they can upgrade. For example, Kubernetes 1.18 comes with a breaking to their openAPI package, so all tools require a step upgrade, if your tool imports some package that depends on Kubernetes 1.17, you are stuck on 1.17 until all deps move to 1.18. This is the case with Flux, we stuck on 1.17 for months now.

Dropping links for reference:

uhthomas commented 10 months ago

This is great and something I've wanted myself for a long time. A lot of these CRDs are generated from Go structs, so it would be great if it would be possible to convert CUE (from cue get go) to a Kubernetes CRD.

myitcv commented 10 months ago

@justenstall thanks for creating #2701 as a strawman for discussion.

I can't speak for the CUE team, personally I would not want Kubernetes as a dependency in CUE lang.

As a guiding rule, we want to keep the dependencies for cuelang.org/go (and by extension cmd/cue based on the current structure) small. Hence we would almost certainly not add Kubernetes as a dependency.

However, I think #2701 is a good strawman to act as a forcing function to ask the question: so where should this kind of adapter live? How could/should cmd/cue (and more generally cue/load and the various encoding/X adapters) work with such a setup?

justenstall commented 10 months ago

@myitcv

I agree with you and @stefanprodan that CUE shouldn't have a dependency on Kubernetes. I can vendor the data types from Kubernetes for the PoC, if there are any other ideas for how that should be done, let me know.

I do think there's a larger conversation to be had around the setup of where adapters/converters should live. If the dependencies for the CRD --> CUE conversion were isolated to the converter, I think it's logical that the converter would use Kubernetes' internal parser and data type for CRDs.

If converters could be isolated, they could rely on tailor-made parsers for JSON Schema/OpenAPI/CRDs/etc, which could have a lot of benefits:

The CRD example is somewhat unique in comparison to other importable formats (for instance, a dependency on an OpenAPI library would be as controversial), but in my opinion it's still a worthwhile conversation.

sdboyer commented 9 months ago

IMO, it would be amazing to see a more consistent set of encoders/adapters for other representations to/from CUE. i've spent considerable time writing such things over the past couple years, and can attest that:

Kubernetes Gateway API already uses CEL, and we're also considering using CEL for FluxCD, guess many more project will follow. Given this, without a CEL to CUE translator, I think the value prop of using CUE for Kubernetes objects validation will be less attractive.

I didn't know this was where things were headed (:hat-tip:), but to me, this is all the more reason that a CUE->CEL translator would be valuable. Otherwise, CUE's probably just out of the game here, regardless of its other potential benefits.

I'd add to this that CUE is a non-starter as the engine for executing validations within an apiserver as long as a single CUE runtime can't be safely used for validation from multiple goroutines.

justenstall commented 9 months ago

@sdboyer I started a Slack thread for encoding discussion, would love to get your input: https://cuelang.slack.com/archives/CMY132JKY/p1701367946843919

kharf commented 6 months ago

I wonder if with the upcoming feature of cue modules and oci registries, a central cue registry like nixpkgs would make sense, which would host all the packages, like crds. Wouldn't that kind of eliminate commands, which are specifically tailored for certain types of files? Over the last year I also had a lot of troubles with importing crds to cue and always went with cue get go, but I think having crds in a registry and just use the dependency management makes it more "platform agnostic" than having commands for special cases.

justenstall commented 6 months ago

I wonder if with the upcoming feature of cue modules and oci registries, a central cue registry like nixpkgs would make sense, which would host all the packages, like crds. Wouldn't that kind of eliminate commands, which are specifically tailored for certain types of files? Over the last year I also had a lot of troubles with importing crds to cue and always went with cue get go, but I think having crds in a registry and just use the dependency management makes it more "platform agnostic" than having commands for special cases.

@kharf

It could help the case but someone would still have to manually produce a CUE module for the CRD, which won't happen for every CRD, and won't be of consistent quality. If CUE could comprehend a CRD's schemas then every CRD becomes usable in CUE at a guaranteed level of quality. Anyone wanting an even better representation of the CRD can still publish a CUE module either starting from scratch or starting from generated CUE.