cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5k stars 283 forks source link

cmd/cue: ambiguous disjunction leads to incomplete value on export #1487

Open xinau opened 2 years ago

xinau commented 2 years ago

What version of CUE are you using (cue version)?

I've build CUE from current latest commit f29b460

Does this issue reproduce with the latest release?

Yes.

$ cue version
cue version v0.4.1 linux/amd64

What did you do?

At my company we are using GitLab CI/CD. To make it easier to maintain it's configuration file I wanted to use CUE to generate the .gitlab-ci.yml and validate it using GitLab's JSON schema that's provided as part of the builtin editor. I've imported the JSON schema with cue import as follows

$ curl -sO https://gitlab.com/gitlab-org/gitlab/-/raw/4335649b/app/assets/javascripts/editor/schema/ci.json
$ cue import -l '#CI:' -p gitlabci jsonschema: ci.json

Which results in the following CUE definition (simplified).

package gitlabci

#CI: {
    // concrete definition from original import omitted for example
    #image: string

    #job: #job_template & ((null | bool | number | string | [...] | {
        script: _
        ...
    } | (null | bool | number | string | [...] | {
        extends: _
        ...
    }) | (null | bool | number | string | [...] | {
        trigger: _
        ...
    })) & _)

    #job_template: ({
        when?: _ // concrete definition from original import omitted for example
        ...
    }) & {
        image?: #image
        // ... fields from original import omitted for example
    }

    // concrete definition from original import omitted for example
    {[string]: #job}
}

But I'm unable to use the definition to validate the following configuration.

package gitlabci

out: #CI & {
    build: script: ["echo 'Do your build here.'"]
}

What did you expect to see?

Assuming a correct import of the JSON schema. The above usage of the imported schema should evaluate to the following for out.

$ cue eval -e out -c
build: {
    script: ["echo 'Do your build here.'"]
}

What did you see instead?

When evaluating (concrete) or exporting the configuration I'm getting the following errors

$ cue eval -e out -c  # with original import
out.build: incomplete value {script:["echo 'Do your build here.'"],when:"delayed",start_in:strings.MinRunes(1)} | {script:["echo 'Do your build here.'"],when:"delayed",start_in:strings.MinRunes(1),extends:string | [string]} | {script:["echo 'Do your build here.'"],when:"delayed",start_in:strings.MinRunes(1),trigger:{project:=~"\\S/\\S"} | {} | =~"\\S/\\S"} | {script:["echo 'Do your build here.'"]} | {script:["echo 'Do your build here.'"],extends:string | [string]} | {script:["echo 'Do your build here.'"],trigger:{project:=~"\\S/\\S"} | {} | =~"\\S/\\S"}

$ cue eval -e out -c  # with simplified import from above
out.build: incomplete value {script:["echo 'Do your build here.'"]} | {script:["echo 'Do your build here.'"],extends:_} | {script:["echo 'Do your build here.'"],trigger:_}

I didn't expect this behavior as it's a valid configuration when validated with GitLab's JSON schema

Other

I'm currently lacking sufficient understanding of the problem to give the issue a good title.

myitcv commented 2 years ago

The ultimate cause here is the disjunctions for #job and #job_template, which contain open structs. Hence, according to the current disjunction elimination algorithm, there are many potential "correct" answers, CUE doesn't arbitrarily choose one, and hence complains "this is incomplete" because it can't discriminate which one you want/intended. This case is not helped by the fact the CUE we are working with is entirely a function of how the JSON Schema has been declared. That said, a core CUE problem remains nonetheless.

Smaller repro to make it clear what's going "wrong":

exec cue export x.cue

-- x.cue --
package x

#x: {
    name: string
} | {
    name:    string
    address: string
}

x: #x & {
    name: "hello"
}

gives:

x: incomplete value {name:"hello"} | {name:"hello",address:string}

cc @mpvl because this feeds directly into the upcoming disjunction work planned after the comprehension changes.

xinau commented 2 years ago

@myitcv Thanks for the clarification. I'm not sure if your repro is "correct". From a user-perspective it's clear that x is incomplete as either (differentiating) field age or address is not specified.

I think the repro should be the following where either a name field can be specified or name and age field.

exec cue export x.cue

-- x.cue --
package x

#x: {
  name: string
} | {
  name: string
  age: int
}

x: #x & {
  name: "hello"
}

gives:

x: incomplete value {name:"hello"} | {name:"hello",age:int}
myitcv commented 2 years ago

@xinau yep, that's absolutely what I intended. Thanks for spotting the mistake. I've updated my example accordingly just in case.

mpvl commented 2 years ago

This issue is marked as "NeedsFix". I'm not certain what fix is expected here?

The typical workaround for this is to mark the least-specific entry as default, so that things can get disambiguated.

A nicer way, btw, to write the simplified example as

#x: {
  name: string
  age?: int
}

But that doesn't work in general if there are more fields without one of the newly proposed builtins.

Note, btw, that if we adopt the semantics of a variant of the required field proposal, this problems goes away: non-concrete fields would be treated as optional (unless explicitly marked as required), allowing disjunction elimination to disambiguate the above example upon manifesting values to a concrete value.

It may be an idea to provide a refactoring method to automatically rewrite disjunctions of the above form. The algorithm would be: 1) anti-unify disjuncts (create least upper bound) 2) diff the lub from the disjuncts to extract diffs. 3) if there is a simple pattern, like each adds a single distinct field, convert it to a normal form and use one of the proposed builtins like numconcrete. #943

A simpler form of the above is to have a vet rule to detect if one of the disjuncts is equal to the least upper bound and suggest it be marked as default.