cuelang / cue

CUE has moved to https://github.com/cue-lang/cue
https://cuelang.org
Apache License 2.0
3.09k stars 171 forks source link

Proposal: core builtin extensions #943

Closed mpvl closed 3 years ago

mpvl commented 3 years ago

Definitions

Before we introduce some of the proposed builtins, we formally introduce some as-of-yet undocumented language features.

Functions

We propose cue supports named argument functions and calls to “structs” as a shorthand for the common macro pattern (e.g. (s & { _, a: x}).out).

A function argument is now defined as:

    Argument       = [ identifier ":" ] Expression .

Any named argument must be followed by other named arguments.

The expression s(a: x, b: y), where s is a struct, is now a shorthand for s & {_, a: x, b: y).

Validator

A validator is a special builtin that is evaluated by unifying it with other values whereby the result is one of a few outcomes:

A validator must be run at the last stage of evaluating a node, after a fixed point is reached evaluating all all non-validator values, in which case any error is considered a fatal error. A validator may be run at earlier stages of the evaluation of a node, in which case an incomplete error signifies that the decision on validity must be postponed.

An example of a language-level validator is <10. struct.MinFields and struct.MaxFields are examples of validators of builtin packages.

Validators can be thought of as a Go function that has an error return signature.

Inferred validators

Optional: Builtin functions that have the signature foo(x1, x2, …, xn) bool may be implicitly interpreted as validators of the signature foo(x2, …, xn) error.

The CUE function notation

We define the following signature format for cue functions:

FunctionDecl   = identifier Arguments "::" Expression .   
Arguments      = "(" [ Argument { "," Argument } [ "," ] ] ")" .
Argument       = [ identifier ":" ] Expression .

Either all or none of the arguments should be named.

The following rules apply for calling functions with this signature:

These rules could be relaxed later.

Proposed builtins

builtins to replace _|_ (bottom)

Although _|_ is part of the standard CUE idiom, it has several issues:

We intend to deprecate the bottom symbol (keeping it around for backwards compatibility) and replace it with builtins that clearer conveys the intent of its usage.

Comparison is not supported by the spec (arguably), but it is a crucial piece of functionality for many CUE configurations. The meaning of it is unclear, however. In many cases, it is used to check whether a reference exists. In some cases, however, the intended meaning is to check that a value is valid. In reality, CUE implements a semantic that is somewhere in between the two cases: it checks the validity of a value, but not recursively.

Note that if any of these builtins return false, they may still be satisfied at a later point in time. Evaluation should take this into account, as usual.

_|_ replacement: error(msg: string | *null) :: _|_

The use of error(msg) replaces the common use of _|_ with the added ability to associate a user message with an error. When used within a disjunction, the error will get eliminated as usual, but upon failure of the disjunction, the user-supplied error is used as an alternative error message.

Comparison to bottom

Uses of comparison against bottom will need to be replaced with one of the following builtins.

isconcrete(expr) :: bool

isconcrete reports whether expr resolves to a concrete value, returning true if it does and false otherwise. It is a fatal error if an expression can never evaluate to true.

Example:

a: {}
b: int

c: isconcrete(a)   // true
d: isconcrete(b)   // false
e: isconcrete(a.b) // false(b could still be defined)
f: isconcrete(b.c) // fatal error (b.c can never be satisfied)

Purpose: replaces if a.foo != _|_ {, where it is checked whether a.foo exists with the purpose of determining whether it is a concrete value.

exists(expr) :: bool (optional)

exists reports whether expr resolves to any value.

Example:

a: {}
b: int

c: exists(a)   // true
d: exists(b)   // true
e: exists(a.b) // false (b could still be defined)
f: exists(b.c) // fatal error (b.c can never be satisfied)

opt?: int
ref:  exists(opt)  // false considered to be non-existing.

req!: int
ref:  exists(req)  // false

Purpose: replaces if a.foo != _|_ {, where it is checked whether a.foo exists regardless of concreteness.

validator builtins

must(expr: _, msg: string | *null) :: _

must(expr) passes if expr evaluates to true and fails otherwise.

Must can be used to turn arbitrary expressions into constraints. For instance, a: <10 can be written as a: must(a < 10). See Issue #575 for details

not(expr) :: _

not(expr) passes if unified with a value x for which expr&x fails and false otherwise. See #571 for details.

Examples:

a: not(string) // number | bytes | {...} | [...] | bool | null

numexist(count, ...expr) :: _

numexist(count, ...expr) passes if the number of expressions for which exists(x) evaluates to true unifies with count.

The main purpose of numexist is to indicate mutual exclusivity of fields.

#X: {
    // either foo or bar may be specified by the user
    numexist(<=1, foo, bar)
    foo?: int
    bar?: int
}

numconcrete(count, ...expr) :: _ (optional)

numconcrete(count, ...expr) passes if the number of expressions for which isconcrete(x) evaluates to true unifies with count.

numvalid(count, ...expr) :: _ (optional)

numvalid(count, ...expr) passes if the number of expressions for which isvalid(x) evaluates to true unifies with count.

Builtins related to concrete values

Purpose: combine schema of different instances of the same package that would otherwise fail because there are conflicting definitions.

manifest(x) :: _

manifest evaluates x stripping it of any optional fields and definitions and disambiguating disjunctions after their removal.

Use cases:

Defining ranges

Looking around at other languages, defining range numbers clearly is a hard problem, as it is often not clear from just looking at the syntax, or even wording, whether or not ranges are inclusive.

CUE’s unary comparators provide a possible solution to this issue.

range(from: int, to: int, by: int | *1) :: [...int]

Builtin range returns a stream of values, starting from from (must be concrete) , adding by (defaults to 1) as long as unification with to succeeds. It is an error to define a range that never terminates.

Examples:

range(from: 1, to: <10)              // [1, ..., 9]
range(from: 1, to: >=0.5, by: -0.1)  // [1, 0.9, ..., 0.5]
range(from: 1, to: <1)               // []
range(from: 1, to: >=1)              // error("infinite range")

Switching

CUE’s if is not paired with an else. This is partly because if really is a comprehension. But another reason is that the use of else quickly leads to nested conditions. A switch statement is generally more conducive to readability in this case.

A switch statement can be simulated in CUE using lists:

choice: [
    if a { x },
    if b { y },
    z,
][0]

is equivalent to the hypothetical

choice: if a { x } else { if b { y } else { z } }

The issue is that the hidden [0] at the end of the switch is impairing readability.

head

A head builtin could make the above more readable. It would do nothing more than select the first element in a list, but doing so by more clearly signaling the intention at the start of the list.

choice: head([
    if a { x },
    if b { y },
    z, // default
])

Package std

We’re considering making all core builtins available under the package std, so that they can be referenced unambiguously and more clearly than using the __ prefix.

import “std”

a: std.range(from: 3, by: -1, to: >0) // 2, 3, 1
seh commented 3 years ago

This is so good to see.

One problem to consider with the "Switch" section: You write, more or less, if a {} else if b {} ..., but quite frequently b is !a or not a, which requires restating a. Could let help here to define the result of a once, and express it being both true for the consequent branch and its negation for the alternate branch?

seh commented 3 years ago

Also, while head is evocative, it does so little that it barely justifies its inclusion. I thought of coalesce as a good name for picking the first suitable item in a sequence that can accommodate "null" or disqualified values. Against that, though, in your "Switch" example, I suppose the list should never wind up with more than one value, as opposed to it being prefixed by any number of "null" values.

mpvl commented 3 years ago

@seh: yes, let could be used here that way, though outside the list. We could perhaps consider allowing let in lists. Also, one could mimic this behavior with: head([if a {}, {}]), where the second element is the "default", and thus!a`.

Regarding head: I agree its utility is a bit meager. We did consider a select builtin which I think is close to what you're proposing, where it would pick the first of any valid entry. The main problem with this pattern seems that it will be too easy to ignore potential errors, so it may be a less safe approach. Having said that, it reads quite nice and we have seen configurations where this would have merit. So it is something to consider. It just seemed safer to see how far one would get with this seemingly safer approach.

I'm not sure I understand the point with the null values, but maybe this answers your question.

Do you think adding head is not warranted and using a [...][0] pattern is sufficient?

seh commented 3 years ago

I was not sure that CUE has the same notion of "null" values that SQL, HCL, Jsonnet, and other languages have, so the semantics of a hypothetical coalesce function might not apply.

I don't think head is warranted without tail (or rest), and perhaps nth. My Lisp is showing. I haven't yet reached for any functions like that, though. I'd rather spend those tokens on set manipulation functions for lists.

Would it be possible to write a CUE "function" that encapsulates your [if a {consequent}, {alternate}][0] technique? It would require at least two inputs; the alternate could be optional. It's not much compression, but might cut down on the "syntactic noise" with those brackets. Yes, I confess that I'm still looking for else.

mpvl commented 3 years ago

@seh: you can do else with the switch approach and I’m not in favor of a dedicated If-else construct, as it encourages bad patterns.

But I see your points otherwise. I guess you could indeed express this as cue macros neatly if we had the call shorthand. head would then be defined as:

head: { #0[0], #0: […] }

One problem is that the first element cannot have a conflicting definition of #0.

But maybe this is enough for now to just point out the pattern and suggest that people comment the construct:

aSwitch: [ // select first match
   if a { … },
   if b { … },
   c // default
][0]

anIfElse: [ // if then else
   if a { … },
   c // else
][0]

This would not require any additions to the language and we can get some experience to see what works. The query addition may also provide useful patterns that obviates the need for this.

mpvl commented 3 years ago

@seh in CUE, bottom (incomplete errors, to be more specific) is a bit like null in those languages. null can mean various things, often not compatible with the notion of null here. So it seemed impossible to assign any specific meaning to it.

cueckoo commented 3 years ago

This issue has been migrated to https://github.com/cue-lang/cue/issues/943.

For more details about CUE's migration to a new home, please see https://github.com/cue-lang/cue/issues/1078.