cue-lang / docs-and-content

A place to discuss, plan, and track documentation on cuelang.org
6 stars 1 forks source link

docs/howto: Ensuring structs in a list contain a field with a unique value #108

Open jpluscplusm opened 8 months ago

jpluscplusm commented 8 months ago

From a slack thread: https://cuelang.slack.com/archives/CLT3ULF6C/p1708094593391999?thread_ts=1708093684.648429&cid=CLT3ULF6C (not persisted here as the thread is wider than just this doc)

Original Slack thread
Richard Wilson 16 Feb at 14:29
I know that to operate as an order independent language we cannot overwrite values, but l'd love to know if something is possible such that if you're in the context of a loop, with the goal of building up an array iteratively for a final result, if there was some mechanism for only performing that check at the end of the loop, instead of during it, so I only have to loop once.
mvdan R 16 Feb at 14:35
For your first point about performance, I see you noticed Marcel's reply in https://cuelang.slack.com/archives/CLT3ULF6C/p1708034735112169?
thread_ts=1707933544.070259&cid=CLT3ULF6C :)
Richard Wilson 16 Feb at 14:36
yes! Though I posted first and scrolled up later.
Richard Wilson 16 Feb at 14:36
I just wanted to get a real-world case out there if it's useful to y'all
31
mvdan
R
16 Feb at 14:37
1 - could you raise a github issue with more details, ideally a reproducer, for your issue with cue eval -p? at least from my understanding of the flag, it should work, so it might just be a bug.
Richard Wilson 16 Feb at 14:37
Yep, I'll set a reminder EOD.
V1.
mvdan
R
16 Feb at 14:38
2 - I think this would be covered by our plans for more builtins, in your case validators like must . See https://github.com/cue-lang/cue/issues/943 and follow the thread for updates. I can't promise we will make progress very soon, as right now we are focused on performance, but it's very much planned
mvdan R
16 Feb at 14:43
back to your perf issue - I think you should be able to enforce unique names via a struct instead, which would save the steps of building the list and enforcing its uniqueness. I would imagine this way would be cheaper, but I haven't verified it.
Here's one example: https://alpha.cuelang.org/play/?id=KVpVx6WD2E#w=function&i=cue&f=eval&0=cue
@Jonathan Matthews do we have a how-to for "how to enforce the uniqueness of a struct field within a list"? this hidden struct pattern is relatively common and we have recommended it before, but I couldn't find it on https://alpha.cuelang.org/docs/howto/find-a-guide/. (edited)
mvdan R 16 Feb at 14:49
you can also use a definition instead of a hidden field if you prefer, basically the same result:
https://alpha.cuelang.org/play/?id=RrtN3|qU_06#w=function&i=cue&f=eval&0=cue
Richard Wilson 16 Feb at 14:51
I don't fully understand the struct example above I'll noodle on it some more. Re definition vs hidden field vs hidden definition, I'm never really sure what the dogma is for those. I've been defaulting to hidden fields when I don't intend the definition to be accessible outside of the file as a form of information hiding
Richard Wilson 16 Feb at 14:53
Isn't the struct example equivilant to https://github.com/Shopify/product-taxonomy/pull/65/files#diff-ecf21091e07e58eefd8001879518bca628068bce331e423d2da6da86e0379124R48 without the
interpolation?
Richard Wilson 16 Feb at 14:54
I guess its because you're setting the index of the array as the value instead of the name?

mvdan R 16 Feb at 14:54
(also cc @Jonathan Matthews, perhaps https://alpha.cuelang.org/docs/concept/find-a-guide/ should contain
"when to use hidden fields and definitions", as there is some nuance that would be good to write down)
but the TL;DR for you, Richard, is that a definition is closed when referenced, whereas a hidden field is not. in the case of my example above, this difference does not matter, and I think both options are fine.
mvdan
16 Feb at 14:55
(both hidden fields and definitions are not exported, which you want for a validation like this)
Richard Wilson 16 Feb at 14:55
Okay, more TIL
mvdan R
16 Feb at 14:55
I guess its because you're setting the index of the array as the value instead of the name?
correct; (foo.name): i enforces that each name has exactly one index within the list.
mvdan R 16 Feb at 14:56
it also gives you a somewhat nicer error message, e.g. conflicting values 1 and 3 (the indices)
Richard Wilson 16 Feb at 14:56
Right, so the difference here comes down to the errors when a conflict occurs, I avoided that because in this case if two of the same GID's try to set different names you get conflicting values namel and name which is a lot more helpful
Richard Wilson 16 Feb at 14:56
If you look at the data in dist/, it's quite large (large for a human to grok) so indicies aren't as helpful IMO mvdan
16 Feb at 14:57
I see. See whether the struct option is faster for you, in which case perhaps it's worthwhile to stick with it with a TODO until the performance work is finished.
Richard Wilson 16 Feb at 14:59
The perf is manageable for this case, for a lot of what's in validations. cue I expect we'd probably move into ruby tests if it becomes cumbersome or is something cue actually cannot do (e.g. the ancestor integrity stuff based on categories.json). I wanted to do what I could there in case it is helpful. In this case the error message from a UX perspective is more helpful than the perf.
Richard Wilson 16 Feb at 15:00
E.g. ensure the ancestors here all have a gid that matches with the gid of this category, since they're flatmapped in the taxonomy.
This file is hidden because it was uploaded more than 90 days ago. Upgrade to a paid subscription to view.
Richard Wilson 16 Feb at 15:00
Any time we get into something that might be a bit recursive I still don't feel I have a good grasp on how to nicely handle it in cue.
Richard Wilson 16 Feb at 15:01
If this category has another as an ancestor, that ancestor should also have this category is a child. etc. At least for the nearest ancestor on the tree. (edited)
Richard Wilson 16 Feb at 15:03
(and yes, I think this stuff can be tested upstream, and it is, I just think there's value in the public repository of having a process that guarentees the output of those upstream processes is correct when people consume them)

mvdan R
16 Feb at 15:04
I'll leave it here for now as I can't provide more input without properly understanding your use case and config :)
Richard Wilson 16 Feb at 15:05
No sweat, I appreciate the discussion! if the call gets scheduled on the 19th I'll make a point to attend
Erik Mogensen 16 Feb at 16:46
Our take on error messages was mentioned in #language the other day. Adapting it to your use case would be something like:
_error_message:
if !list.UniqueItems([for v in k. values {v.name}]) {
_error_message: "The attribute_value_integrity values aren't unique. "
}
This at least lets you provide a user-defined error message, possibly helping users solve the problem rather than focusing on what's wrong. You can interpolate \(references) to other in-scope variables to make even better error messages.
Richard Wilson 16 Feb at 17:23
so this would show up as conflicting values "'' and "The attribute_value_integrity values aren't unique." ?
Richard Wilson 16 Feb at 17:23
instead of true and false
myitcv 21 Feb at 16:43
@Richard Wilson thank you for sharing! I look forward to following up on this thread early next week (apologies for the delay in responding). Right now I am focussed on working out ways in which we can improve our communication regarding performance in CUE, as part of
https://cuelang.slack.com/archives/CLT3ULF6C/p1707933544070259 (I see you are also in that thread).
But look forward to getting back to you on some of the points you raise!
01
Richard Wilson 22 Feb at 16:10
No sweat thank you sir
Jonathan Matthews 25 Mar at 12:23
I've tracked:
• "uniqueness of a struct field in a list" as https://github.com/cue-lang/docs-and-content/issues/108
• "hidden fields and definitions" as https://github.com/cue-lang/docs-and-content/issues/109
! exec cue vet file.cue
-- file.cue --
people: [
    {name: "alice"},
    {name: "bob"},
    {name: "bob"},
]

_uniqueNames: {
    for i, person in people { (person.name): i }
}

... or maybe this, which allows a set of specific fields to be checked whilst ignoring others:

! exec cue vet file.cue
-- file.cue --
import "list"

people: [
    {name: "alice", someOtherField: 1},
    {name: "bob", someOtherField: 2},
    {name: "bob", someOtherOtherField: 3},
]

_uniqueNames: list.UniqueItems & [
    for e in people {name:e.name}
]

The thread this came from was perf-related, so perhaps some tests at scale on the above 2 options might be useful to help decide the approach to be documented.

cuematthew commented 1 month ago

Having played around with the code a fair bit, I actually prefer this:

import "list"

people: [
    {name: "bob"},
    {name: true},
    {name: 1}
] & list.MaxItems(len(_uniqueNames))

_uniqueNames: {
    for person in people { "\(person.name)": _ }
}

I prefer it because of the fact that people directly references _uniqueNames thus making it easy to discover, from people where other constraints exist.

Also, using string interpolation means that it works with more than just strings. But, string interpolation throws away type information, which is bad, so eg this will fail:

people: [
    {name: "1"},
    {name: 1}
]

which is saddening.

cuematthew commented 1 month ago

incorporating some feedback from @mvdan , perhaps this is nicer:

import "list"

people: [
    {name: "bob"},
    {name: true},
    {name: 1}
]
people: [...{name!: _}] & list.MaxItems(len(_uniqueNames))

_uniqueNames: {
    for person in people { "\(person.name)": _ }
}

(and of course that _ can be changed to string or whatever if required).

cuematthew commented 1 month ago

(relates to https://github.com/cue-lang/cue/issues/3538)