cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.01k stars 283 forks source link

evaluator: incorrect behaviour when choosing defaults #3296

Open myitcv opened 1 month ago

myitcv commented 1 month ago

What version of CUE are you using (cue version)?

$ cue version
cue version v0.0.0-20240712164527-719893f23850

go version go1.22.3
      -buildmode exe
       -compiler gc
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH arm64
            GOOS linux
             vcs git
    vcs.revision 719893f23850172d224720e6d1257586179ac895
        vcs.time 2024-07-12T16:45:27Z
    vcs.modified false
cue.lang.version v0.10.0

Does this issue reproduce with the latest release?

Yes

What did you do?

! exec cue export x.cue
stderr 'b: 2 errors in empty disjunction'
stderr 'b: conflicting values 4 and 1'
stderr 'b: conflicting values 5 and 1'

-- x.cue --
a: *5 | int
if a == 5 {
    b: 1
}
b: a | *4

What did you expect to see?

A passing test (for some variation perhaps of the error messages).

What did you see instead?

> ! exec cue export x.cue
[stdout]
{
    "a": 5,
    "b": 1
}
FAIL: /tmp/testscript4268956037/repro.txtar/script.txtar:1: unexpected command success

The output of b: 1 cannot be correct given that a: 5, and b is constrained such that b: a | *4.

To confirm: the exact same behaviour is seen with CUE_EXPERIMENT=evalv3, i.e. the bug exists in both old and new evaluator implementations.

Another observation: there is nowhere in the spec where we can refer to try and reason about what the behaviour should be here.

cuematthew commented 1 month ago

So, I think we have this situation, following the details of https://cuelang.org/docs/reference/spec/#default-values

a: int | *5    <int | 5,  5>
b: a | *4      <int | 4 | 5, 4 | 5>

a resolves to 5. The problem seems to be what then happens: presumably at this point we should actually do:

b 🡒 <5> & <int | 4 | 5, 4 | 5> 🡒 <5 & (int | 4 | 5), 5 & (4 | 5)> 🡒 <5, 5> (rule U1)

but I think this step isn't happening. I.e. we don't seem to be tracking that b depends on a, at least not sufficiently.

The result is that we pass into the if a == 5 block and add a <1> unification without having eliminated the int. Consequently we get:

b 🡒 <1> & <int | 4 | 5, 4 | 5> 🡒 <1 & (int | 4 | 5), 1 & (4 | 5)> 🡒 <1, ⊥> (rule U1)

whereas we should be attempting <1> & <5, 5> which should lead to ⊥.

myitcv commented 1 month ago

My hunch is that the following happens.

Some time after evaluation starts, we realise can't make any more "progress". At that point, we have the following facts (I leave out the if comprehension on purpose to focus more on what we know about a and b):

a: *5 | int
b: a | *4

At this point, my intuition is that what we should do is choose a default for a but not b, and leave the "reference" from b to a intact. Reason being, a is "not done yet" and the existence of the a to b dependency means that after choosing defaults we might know more. I can't see anything in the spec however which would confirm this rule of ordering however.

Instead what I think currently happens is that we "erase" the reference to a too early (much like you suggest). Indeed that must happen before we have chosen the default for a because otherwise we would know that a: 5. So we end up with, in effect:

a: *5 | int
b:  *5 | int | *4

Then the default for a is being chosen leaving us at:

a: 5
b: *5 | int | *4

At which point we can evaluate the original if comprehension which leaves us at:

a: 5
b: *5 | int | *4
b: 1

This reduces down to:

a: 5
b: 1

without us even needing to revert to choosing a default for b.

Even if something else is happening here, I think the lack of spec clarity with respect to when defaults are chosen and when references are "erased" is a problem.

Here is an interesting follow-up example on the sequencing of default selection:

c: *5 | d
d: c | *4
if c == 5 {
    d: 1
}

In this case we should again get an error as the result, but we would choose defaults for a and b at the same time. i.e. we would get to a point in evaluation where we know:

c: *5 | d
d: c | *4

and not be able to make any further progress. Because of the mutual references, neither default selection orders before the other, so we choose defaults for both simultaneously. At which point we know:

c: 5
d: 4

Now we can evaluate the if comprehension and end up with:

c: 5
d: 4
d: 1

and hence error.

rogpeppe commented 1 month ago

Here's another example, maybe related, which had a similar effect, but only with the old evaluator:

import "strconv"
x: *strconv.Atoi(y) | "1"
y: x

With the old evaluator cue export of the above produces:

{
    "x": 1,
    "y": "1"
}

which similarly fails to validate if vetted along with the original.

With CUE_EXPERIMENT=evalv3, it produces a cycle error, which seems about right.

Maybe the original example in this issue should also be a cycle error?

cuematthew commented 1 month ago

Even if something else is happening here, I think the lack of spec clarity with respect to when defaults are chosen and when references are "erased" is a problem

For me, this is the key problem: it is not clear (to me?) what b: a | *4 means: it could mean that b gets a deep copy of a before a has been evaluated at all. I think this is similar to referencing a definition - you're copying it and each copy is unrelated and so can be unified/evaluated in different ways. I believe that if this is the case, you can get to the a: 5, b: 1 result. Alternatively, b: a | *4 is just a reference to a's disjunction and so a and b share part of the graph, and so any evaluation of that graph should affect them both. In which case the only possible result (I think) is a: 5, b: 5.

If the latter option is chosen, note that in general, if b references a in this way (potentially transitively) and a has a default, then b's default (if declared) will never be chosen, and as such we might want to produce an error or warning of this.

rogpeppe commented 1 month ago

In which case the only possible result (I think) is a: 5, b: 5.

... but clearly, if a is 5, then b must be 1 (because of the if) and hence the above is not a valid result.

Alternatively, b: a | *4 is just a reference to a's disjunction and so a and b share part of the graph, and so any evaluation of that graph should affect them both.

FWIW my understanding is that this is (or should be) the case. In support of this, consider:

#x: {
    a: *5 | int
    b: a
}

x: #x
x: a: 1

If b just made a deep copy of a, then I think that it would end up as 5, because it wouldn't be influenced by the eventual value of a. But it ends up as 1, which seems right to me.

In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to. Anything else seems like it's going to be less intuitive and useful.

Thus I believe that the original example should be an error of some kind (whether conflict or cycle I'm not quite sure).

cuematthew commented 1 month ago

In which case the only possible result (I think) is a: 5, b: 5.

... but clearly, if a is 5, then b must be 1 (because of the if) and hence the above is not a valid result.

Ah yes, sorry - too sleepy this morning and I'm leaving out the if ... when it suits me.

In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to.

With the exception of references to definitions, yes?

myitcv commented 1 month ago

With the exception of references to definitions, yes?

No. Definitions are special only in that they recursively close their value on use of the definition, i.e. a reference to it. Otherwise, all fields (regardless of definition or not, hidden or not) unify in the same way, and a reference to a field is to the resulting/ultimate constraints "gathered" by that field.

I think one of the confusing points with this issue as I mentioned in https://github.com/cue-lang/cue/issues/3296#issuecomment-2246898444 is that it appears that a "copy" of a is happening in the reference from b before we get to the ultimate value of a. But that's exactly the bug: we shouldn't be doing that.

cuematthew commented 1 month ago

Yep, after much discussion and thought, I'm happy I'm in a less confused state wrt definitions etc.

mpvl commented 1 month ago

Defaults are not picked by taking a reference. Moreover, terms within a marked default that themselves are disjunctions have their defaults erased. The spec is quite explicit about this, although it could use more examples.

Binary operators resolve a references to a scalar, so in this case the reference to a is resolved to 5.

So b is equal to 5 | int | *4, which after unifying with the result from the comprehension results in 1.

mpvl commented 1 month ago

@rogpeppe

In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to. Anything else seems like it's going to be less intuitive and useful.

This is definitely not the case. The & and | operations would not be very useful if this were the case. For instance, you would not be able to override a default value. In general, a reference results in a local copy of what it refers to. Logically at least. There are optimizations such as structure sharing to prevent actual copying.

myitcv commented 1 month ago

Not to properly respond to any of the points above, rather just note down an observation based on my reading of the thread: namely that if I see:

b: a

there are at least two possible "interpretations" for the reader:

  1. We are templating b from the constraints gathered against a
  2. We are constraining b to be the value of a, where the phrase "the value of a" to mean the observed value of a, i.e. the result of cue export.

Per Marcel, the spec and implementation is currently adhering to 1.

My observation is that because of these two "interpretations" for the reader, then we at least have a visual ambiguity as to what it means to write b: a. An ambiguity that is compounded/made more confusing by b: a + 0 doing something else.

As such, I'm wondering if we need to tease apart these two situations with different syntax/other.

I leave this very much as an incomplete thought/comment... mainly to not forget where I got to so that I can come back to it later.