cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.09k stars 290 forks source link

Proposal: unify `float` and `int` as simply `number` #253

Open cueckoo opened 3 years ago

cueckoo commented 3 years ago

Originally opened by @rudolph9 in https://github.com/cuelang/cue/issues/253

Background

JSON has no concepts of float or int only number. The current behavior of cuelang has float and int as completely disjoint branches on the value lattice. Further, any concrete number with a dot . implicitly is a float (e.g. 0.0) and any number without a dot implicitly is an int (e.g. 0).

This causes an error in the following example:

foo: 0
foo: float

A couple tickets related to this behavior: https://github.com/cuelang/cue/issues/252 https://github.com/cuelang/cue/issues/233

Objective

Unify float and int as simply number which would make for example {foo: 0.0} & {foo: 0} valid cuelang.

Transition

Phase 1:

Phase 2:

Phase 3:

cueckoo commented 3 years ago

Original reply by @rudolph9 in https://github.com/cuelang/cue/issues/253#issuecomment-571805831

related slack thread

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/253#issuecomment-571981648

This does simplify things considerably, which is good.

For good measure, let me bring up some clarifications and concerns:

Change the behavior of int to declare a number with a constraint that does not allow decimal values.

Binary floating points (what CUE uses) allow integer values with decimals. (e.g. 0.00). I presume you would allow such numbers to be assignable to int.

This also means that, say 1/ Infinity or any other computation that results in an integer after some precision cutoff point will be treated as an integer. In Go this works fine.

One concern is that this may make configurations somewhat unstable: for instance, a configuration may have a field a: int & (x / y) which is valid for some values of x and y and not for other. When distinguishing between these types it forces the user to specify how to handle the invalid cases upfront. Not that currently, int & (x / y) is not a valid expression in cue

Another concern is that non-integer values may unify with int due to rounding errors. Experience with arbitrary-precision constants in Go teaches us this is not much of a concern in practice.

An alternative is to remove the ability to require a field to be of type float. The predeclared identifier float could then be equated to number. In other words, 0 unifies with either number or int, but 0.0 can only unify with number. This solves some of your concerns.

But overall, this is definitely worth considering. It simplifies things considerably at various places. NCL, a Google-internal configuration language does something similar and this seems to work fairly well. The main drawback there is the inability to constrain a number to int, which is solved by this proposal by keeping int as a constraint on number.

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/253#issuecomment-571996095

I can see the following options: 1) keep as is (int and float are both instances of number) 2) keep as is, but remove float as a predeclared identifier. I.e. 0.0 and 0 would not unify. In other words, internally float and int exist, but fields can only be of type number or int. 3) only have number and int as type, where int is an instance of number. 0.0 would be of type number while 0 would be of type int. 0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0. 4) the proposal of this issue: reduce to one number type, and let int be a constraint to specify only integral values. 0.0 and 0 would be identical.

Option 3 would have most of the protections of the current CUE implementation, while being simpler, while Option 4 would be the simplest and offers slightly better JSON compatibility (allowing 0.0 to be assigned to integers). The latter could be addressed with JSON interpretation, but this may be confusing as CUE is a superset of JSON.

Allowing literals to be of multiple types (e.g. 0 is of type int|float) is not an option. It was considered in the past but was rejected due to implications in the lattice model.

cueckoo commented 3 years ago

Original reply by @rudolph9 in https://github.com/cuelang/cue/issues/253#issuecomment-572782118

One consideration (particularly for options 3 and 4), importing cue specs into other languages is to use field attribute to indicate a float type.

foo: 1.0 @golang('type', float) //I think this is the right syntax? 
cueckoo commented 3 years ago

Original reply by @rudolph9 in https://github.com/cuelang/cue/issues/253#issuecomment-572787261

  1. only have number and int as type, where int is an instance of number. 0.0 would be of type number while 0 would be of type int. 0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0.

@mpvl is there an example analogous to "0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0"?

If I'm reading this correctly (please correct me if I'm mistaken) the following would be true:

a: int // => int
b: number // => number
c: number & int // => number
d: 0 & int // => 0
e: 0 & number => 0.0
f: 0 & 0.0 => 0.0
g: int & 0.0 => _|_
h: number & 0.0 => 0.0

If my understanding is indeed correct the following three seem out of place since int is an instance of number

c: number & int // => number
f: 0 & 0.0 => 0.0
g: int & 0.0 => _|_

int by definition is more specific so I would think f: 0 & 0.0 => 0 and g: int & 0.0 => 0 and c: number & int => int.

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/253#issuecomment-677736494

@rudolph9 c: number & int // => number this would currently result in int.

There are various strong reasons to adopt this proposal in some form:

There are some design issues still to be solved. Like having number and int seems inconsistent. Aside from naming, though, it seems good to adopt this.

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/253#issuecomment-739462008

Another data point: The Go json unmarshaler doesn't allow decoding 1.0 into an int.

Atm, I'm leaving to option 3.

Open to evidence it should be 4 though, especially if this is required by JSON schema compatibility for instance.

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/253#issuecomment-739462513

There seems to be no consistency in the JSON schema world on this matter:

The precise treatment of the “integer” type may depend on the implementation of your JSON Schema validator. JavaScript (and thus also JSON) does not have distinct types for integers and floating-point values. Therefore, JSON Schema can not use type alone to distinguish between integers and non-integers. The JSON Schema specification recommends, but does not require, that validators use the mathematical value to determine whether a number is an integer, and not the type alone. Therefore, there is some disagreement between validators on this point. For example, a JavaScript-based validator may accept 1.0 as an integer, whereas the Python-based jsonschema does not.

http://json-schema.org/understanding-json-schema/reference/numeric.html

But based on this I see further evidence to chose 3.

cueckoo commented 3 years ago

Original reply by @rudolph9 in https://github.com/cuelang/cue/issues/253#issuecomment-740077155

0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0.

0.0 ultimatley constrains the same number as 0 so it seems odd that 0.0 & int // => _|_. It almost feels like we'd be better off getting rid of both int and float since they both carry common system notions of types and how the values are stored on the underlying system (which JSON has no notion of). Perhaps we could decommission both float and int and introduce something more mathematically pure to truly represent a constraint on number for valid integers

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/253#issuecomment-753592585

0.0 ultimatley constrains the same number as 0 That is not necessarily true.

A big benefit to keep number separate from int is that one can represent "imprecise" calculations syntactically. For instance, a division may result in an inexact number that is 0 after rounding. Such a number should not be considered to be an integer, as it can result in unpredictable and inconsistent outcomes. One could argue that this problem goes away when there is only one numeric type to start with, but the reality is that CUE is used in a world where it needs to roundtrip with systems that do make these distinctions. And thus will lose the ability to use typing to ensure that a result is indeed an integer.

In general, it is a good idea that if CUE interprets a value in a certain way, it can also be represented that way in CUE itself. Right now, that can be done as 0., 0.0, etc, where the number decimals indicates the precision.

When we collapse the two numeric types, there is now way to represent this in CUE itself.

so it seems odd that 0.0 & int // => _|_

That really depends on where you are coming from and I don't agree this is generally true. The decision on this should be guided by usability, practice, and how things fit in the value lattice. I think there are good arguments to allow 0 to be interpreted as a float and even to get rid of the float type. But I haven't really seen good arguments to adopt only a single numeric type, while finding good arguments to keep number and int.

cueckoo commented 3 years ago

Original reply by @extemporalgenome in https://github.com/cuelang/cue/issues/253#issuecomment-767944418

Allowing literals to be of multiple types (e.g. 0 is of type int|float) is not an option. It was considered in the past but was rejected due to implications in the lattice model.

But wouldn't options 3 and 4 change this arrangement, and thus reopen the discussion?

0 could be of type number|int (which if I understand correctly, unifies to just type number). In option 3, since 0 is an instance of int which is in turn an instance of number there should no longer be any conflict? 0 & int & number works today. Thus 0 and 0.0 would both be valid literals for any field constrained to number, but as soon as that same field is also constrained to int, 0.0 would produce an error.

Would it be a problem to defer inferring the type of an integer literal until it is determined that no other constraints prohibit the unification with int (or alternatively, it is determined that there are any constraints that explicitly constrain it to an int)?

  1. the proposal of this issue: reduce to one number type, and let int be a constraint to specify only integral values. 0.0 and 0 would be identical.

This seems much cleaner and more flexible to me, particularly if arbitrary precision constraints can be encoded somehow, perhaps with a new operator (or clever mathematical use of existing syntax), for example: "#Currency: " or "#Even: <this number has -1 fractional digits of precision)"

This would also work well with the internal number representation switching from arbitrary-position base-2 floats to arbitrary precision rationals, since JSON does not mandate any particular interpretation of range or precision (and before 2014, none of the JSON RFCs I've seen even provide guidance on interoperability). I've seen applications that do treat some JSON numbers as having arbitrary decimal precision, for example (sometimes encoded in strings, sometimes not), where by contract or agreement, client and server agree on the arbitrary precision interpretation. It would be quite powerful if CUE could drop the binary number assumption in the core type, leaving the role of precision and base to just constraints. I am not a mathematician, so I do not know what the practical implications of this would be: would we limit arithmetic to having no more precision, even temporary, than the constraint allows, or would the result merely need to be no more precise (no rounding) than the constraint dictates?

I believe some hard, but solvable, problems with making int a precision constraint are:

  1. Error messages could be bad unless cue implementations can recognize and special case "zero digits of precision" during error reporting (whether arrived at via a builtin int constraint or arrived at via a free-form precision constraint): we should continue to say "3.1 is not an int" rather than "3.1 has more than 0 fractional digits of precision" (we could use the precision-style error for non-integer precision constraints).
  2. encoding/* and 3rd party tooling should also be able to easily recognize an integer constraint, so that exporting to formats like JSON Schema can continue to express the type as "int"

If the above could be solved and are worth solving, I believe option 4 would lead to a better language than option 3, though option 3 sounds like it would also result in a better language than we have today.

myitcv commented 2 years ago
  • declare float deprecated

Raising a question in the context of #1883. In that issue, I have proposed a mode in which any dropping down to float precision should result in an error (unless explicitly cast to float in the arguments to that function/expression). There is a lot of detail still missing from that sketch, but flagging that point in the context of this proposal.

jbcpollak commented 7 months ago

Just found this because it was extremely confusing that an integer was not validating with the following cue:

given:

scaling_factor: float | *0.5

this was not valid:

{
   "scaling_factor": 3
}

which seems wrong to me.

I appreciate the complexity of this issue and typing, but its a bit frustrating to have to specify scaling_factor: int | float | *0.5, since all ints are implicitly floats.

beyondan commented 1 month ago

Curious if any work is being done regarding this? We are currently having to educate everyone to use number instead of float everywhere, but it's obviously not a clean solution.

Or is there a way to parse all float types as number at compile time using go sdk? Something like...

ctx := cuecontext.New()
instances := load.Instances([]string{"foo.cue"}, nil)
v := ctx.BuildInstance(instances[0])

// Everything below is from imagination, looking for something like this:
v.Walk(func(x cue.Value) {
    if x.Kind() == cue.FloatKind {
        x.Kind = cue.NumberKind
    }
})

I've tried digging through the source code for a bit, and it seems like

a: number

compiles to an Ident node with Name = "number", and I assume this node can be conjuncted with BasicLit node with either Kind = FloatKind or Kind = IntKind (total guess). In any case, I'm having trouble coming up with a simple workaround for this issue, and would greatly appreciate any help if possible.