KittyCAD / kcl-experiments

KittyCAD Language
9 stars 0 forks source link

Numbers #14

Open nrc opened 1 month ago

nrc commented 1 month ago

Numbers are currently just numbers as far as the user can tell (sort of, see below), which is nice. Internally, they are either integers or floats, and this leaks out a little bit. I think shielding users from this distinction is the right thing to do, and for most uses (i.e., in geometry) decimals are the right thing to have. There are some rough edges or things we should properly define though.

Here are questions I think we need to solve sooner rather than later (before 1.0 IMO):

And more long-term questions:

nrc commented 1 month ago

Related issues:

paultag commented 1 month ago

should we use a delta in equality operations (and similar), e.g., is 0.00000001 == 0 true

My one bikeshed hill i'll always die on is that comparisons with fixed epsilons really need to be done carefully and with a /lot/ more thought that I think we want to give it. We need to consider the number's size , so that we can understand the precision in the representation -- and even then, if the number was multiplied and divided by large numbers it may still have a loss of precision just due to that round-trip, even if it's in some fixed range.

e.g.,:

the following numbers have the same bitwise representation (sign, mantissa and exponent)

Python 3.12.7 (main, Oct  3 2024, 15:15:22) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 9007199254740993.0 == 9007199254740992.0
True
>>> 

(or in rust)

And even when you need to turn it back into an int

>>> x = 9007199254740993
>>> int(float(x)) == x
False

I'd much prefer (personally, due to scar tissue and heartache) rely on some sort of floating point equity function that accepts an epsilon that the user has to provide with knowledge of the number's history/possible values. For the example above the correct eps is 1.0, but if your numbers never get outside the range of -1, 1 1 is unacceptably high.

jtran commented 1 month ago

I'm generally with you, @nrc 👍

I agree that for KCL's primary uses, decimals are the right choice. Obviously, there will always be exceptions. I think it would be okay if that were more verbose since it's a lot less common.

Integer Coercion

If I don't bring it up, someone will: Would we want to do automatic coercion to int for things that require it, like array indexing? A rough edge now is that it's unclear to users when exactly they need to use int(). I do it just by trying it, seeing an error, and then adding int() calls wherever it complains.

Epsilons

The problem with epsilons is that even as a user who's aware of them, it's not always clear what a good epsilon value should be. If I'm writing a library that accepts input, a good epsilon might need to be computed from the input or even a parameter. To do it right, you can't just have a constant or several constants.

Ideally, the tool/language would guide people to do the right thing. Could int() and comparison accept optional tolerance parameters? I suppose that's one way that helps.

But it seems unsatisfactory because it's error-prone and makes everything more verbose and annoying. I'm not sure what's better though. I imagine that the "right" way is similar to tracking significant figures or propagation of uncertainty. I'm not aware of a programming language that does this, but it sounds like yet another difficult problem that affects everything that we can't tackle in the short-term.

Long Term

Precision

You might want to check out @lf94's trick question in Slack:

What's the highest number we can use in KCL?

I'm going to spoil this for you.

The highest numeric you can represent and assign to a variable in KCL is the highest finite Rust f64, which is 1.7976931348623157E+308f64. We don't currently support infinity due to https://github.com/KittyCAD/modeling-app/issues/1130. But when you actually go to use that with anything that interacts with the engine, its max is somewhere between f32's and f64's max. It actually depends on which operations you're doing. Sketch has more precision than extrusion.

@lf94's point was that KCL doesn't exist in a vacuum. If the engine has limits, maybe those limits should intentionally be part of KCL.

Then again, it might be nice if you could do perfect arbitrary precision math all in KCL, and it only loses precision when you cross the boundary to the engine and get its output. But this difference will inevitably be exposed to users, which seems like a rough edge that will be a continual cause of questions. In the extreme, I could imagine someone wanting to re-implement complex math in KCL so that it doesn't lose precision.

nrc commented 1 month ago

Lol, that I kept using delta instead of epsilon :-)

Anyway, one thing I forgot to mention is that there is an absolute ton of prior art here. It's an area I'm not super familiar with though. I think that there are probably some good solutions we can pick hopefully without much modification rather than working things out from scratch.

@jon:

re integer coercion: yep, this is a big component of my first point. I want int to disappear. I'm not exactly sure how, but it sucks.

re epsilons: (and @paultag) I'd love to find out more about how this is handled in other systems and PLs. The only options I'm aware of are using fixed SF (sub-optimal), user-specified tolerance (far from ideal in terms of UX, but maybe more acceptable in a CAD lang than a general purpose PL, and using rational arithmetic so tolerances aren't needed (imperfect and hard to implement well). Possibly some combination will work or there are probably more sophisticated solutions out there.

re precision: I think this is not all long-term and we should actually think about the interaction with the engine up front (even if we don't implement anything, I think we must plan this a bit). My thoughts are that there is arithmetic which is purely in KCL and we need good answers there independent of the engine limitations, but we should make sure that that does align with what happens in the engine. I'm still getting my head around exactly what we should be doing locally vs in the engine...

yeroca commented 3 weeks ago

Reading the above comments, it's not clear to me when you are referring to "decimal" vs. "float". Just in case those are referring to the same thing, I would like to point out that some languages support "decimal float" which is represented by a base-10 mantissa instead of base-2. GNU C, for example has some support for these types: _Decimal32, _Decimal64, and _Decimal128.

These decimal types seem well suited for CAD use, but I don't know if Rust supports them. Some CPUs and GPUs do have support for these types.

The downside is that ISO/IEC decimal types seem to have been in the draft stage for at least 10 years.

For reference: https://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html

nrc commented 3 weeks ago

I tried to be precise about using decimal to mean a number including a part which is smaller than one (in contrast to integer, but also in contrast to a rational where the 'smaller than one part' is implicit in the fraction representation) and float to mean the implementation mechanism of floating point numbers.

I don't think it's necessary for Rust to support any approach we choose, we can always implement it in user space (though if we're using WASM, that limits what we can do efficiently).

I believe that some 'big decimal'/arbitrary arithmetic libs uise decimal floating point and it's definitely an approach we should consider.

yeroca commented 3 weeks ago

I was assuming there would be performance issues if you rely on a userspace library for the arbitrary precision arithmetic, which might not be usable in a GPU, but I don't know much about the processing needed in the engine. Thanks for considering my comment.

nrc commented 3 weeks ago

It's complicated because of the server/client split and because the KCL interpreter is in WASM, but in general for our work I think a userspace library should be fine, they can be well-optimised and it's unlikely to be the performance bottleneck. The hard bit is doing this in KCL and the engine and making sure we get the same behaviour

nrc commented 3 weeks ago

TL;dr: keep f64 for numbers (no internal or user-facing int type) for now, implicit rounding to ints as required with 0.1 epsilon (not for comparisons), remove int() from std.

I don't have an opinion about the long term yet, I see a few reasonable options for our number representation: floats, fixed point decimals, or arbitrary precision floating or fixed point numbers (possibly even rationals, though handling pi, sin, cos, etc. in that case doesn't work). I don't know about comparisons. We should look at what other CAD systems and other PLs do.

I'm unsure about a user-facing int type (whether we should have one or not, and if we do whether it should be integrated with UoMs (because it only really makes sense for 0d units to be integers), but if we do we have one, we should treat it as a subtype (or more generally, implicitly convertible to) our number type. We should allow implicit conversion from number to integer.

Proposal for launch

For now, my proposal is that:

  1. remove the IInteger type of literal
  2. remove any int types from std functions
  3. remove the int function from std (note that we keep floor, ceil, etc.). For back compat, we could keep int() for a while, warn that it's deprecated and unnecessary, and rewrite it away while reformatting.
  4. wherever we require an integer (e.g., for array accesses or in built-in functions, etc.), the runtime (i.e., implemented in Rust, not KCL) should convert a number (internally an f64) into an integer (internally a usize or other integer). We should check that the number is close to being a round number (I propose < 0.1 from the nearest integer, rounded up or down as appropriate) and give a runtime error suggesting using floor or ceil if it's not. E.g., using 5.099 is ok 4.9 or 5.1 is an error. It's also an error if the number is too large or small to represent as an integer or is NaN, infinity, etc.
  5. We only support floating point arithmetic and comparison with no epsilon (i.e., no change)
  6. No UoM for now, but we think more about this before launch (i.e., no change)
  7. No view on formatting for now, leave it to the UI folk and/or do something reasonable as we go along (i.e., no change)

Rationale

My guiding principle is that users should not have to worry about the implementation of numbers. The current situation which distinguishes between 5 and 5.0 or 1 * 5 is kinda ridiculous and int(n) is noisy and confusing. We must fix this before the release (I'm open to better ideas for how to fix it, but I'm 99.99% sure it needs fixing).

Epsilon size for rounding

I want to make it as easy as possible to use numbers, including the output of calculations for array access, etc. However, I think it is worth requiring the user to be explicit about rounding where rounding is necessary. It seems to me that how numbers are rounded is an important consideration when doing CAD and that often if the maths does not result in a round number, that might indicate a bug in the program (e.g., if the user is computing how many screw holes to fit in a piece, they might want to always round up (for safety) or always round down (for aesthetics) or might expect the maths to always result in a round number).

I think that choosing a fixed epsilon for rounding is much easier than picking one for comparisons (and there is no need for these epsilon values to be related in any way). I think 0.1 is a good compromise - it is large enough that it will address nearly any floating point errors (at least for reasonable-sized numbers likely to be used in KCL) and small enough that it will catch many bugs both in literals (why I chose < 0.1 rather than <= 0.1) and computation results.

Optimisation of integer literals

Where we write something like foo[3] we are doing something a bit clown-shoes with this proposal: we're reading 3 storing it as f64 and then later converting it to usize with accompanying checks. We should absolutely optimise this in the future, but we should do it such that it is never observable to the user. I think it is better to start from the correct but non-optimised version, rather than try and preserve any part of the current broken system. (If it looks very easy to implement all this while still preserving knowledge of integer literals, then that's fine too, but I don't want to prioritise that over getting something that feels good to the user shipping).

Future compat

I'm pretty sure we don't want to force users to explicitly convert from numbers to ints. The possible future compat issues I can think of are:

  1. we want to change the value of the rounding epsilon
  2. we use some kind of non-floating/binary representation of numbers and don't want to use an epsilon for rounding (effectively the same as the above issue where the new epsilon is zero)
  3. we add an explicit int type
  4. we want to change the rounding functions
  5. we change the representation of numbers from floating point to something else
  6. we introduce a comparison epsilon
  7. we want the user to be able to supply a tolerance for rounding rather than using a fixed one
  8. we want to restrict the size of numbers to correspond with engine limits

I think that 1, 2, 5, and 6 will technically be breaking changes but they'll be small, technical ones which will have very little impact (unless we pick a very bad new value for epsilon).

I think 3 will not have much effect if we allow implicit conversions from number to int and int to number (which I think we should, with some dynamic checks).

I don't think 4 will cause any issues.

I think if we support 7, then it will be in addition to implicit rounding with a fixed epsilon (which might be a different epsilon, but as for 1, I don't think it will cause real issues).

8 could cause back-compat issues, but it should only make runtime bugs into checked errors, which I think is good.

adamchalmers commented 2 weeks ago

Seems good to me! I think implicit conversions make sense for KCL given that it's almost always dealing with fractional numbers, and that whole numbers are really just used as counts for functions (e.g. replicate this 4 times) or indexes into arrays.

jtran commented 2 weeks ago

It sounds good to me overall.

I had a few thoughts, but I'm willing to follow your lead, as I don't have any concrete suggestions.

  1. For units and dimensions, even though something has zero dimensions, it can still have a kind that makes it not comparable with another kind, for example. Even if we implemented native units, we might decide that we don't support this. But it's something to be aware of.
  2. Epsilon for integer rounding being less than 0.1 seems kind of arbitrary to me. But I don't have a better suggestion.
    1. In general, anything that's implicit is something that could bite us later when we want to change things because we don't know whether the user wanted the implicit behavior or not. And this seems like a correctness thing, so I'm not sure that the general approach of relaxing this later would be safe. It seems that users would start relying on the 0.1 behavior as essentially an assertion.
    2. I also feel like this should be as simple (and performant) as possible to implement because if anyone does array indexing in a loop, it could become a performance bottleneck. And because it's part of the built-in semantics, it will be used in a lot of places and we can't omit it without complex optimizations.
    3. If this is the pseudocode of the implementation, floating point is very often not intuitive. See this Rust playground. In this scheme of strictly less than 0.1, the 0.1 literal would not round to 0, but 4.1 would round to 4. I'm just saying that without doing something fancy, the implementation will leak through to users.
  3. Optimizing integer literals: I agree that we shouldn't worry about it. I'd prefer to save this for when we have a real optimization step.
  4. Future Compat: If we wanted to be more conservative, we'd require being explicit about the rounding approach. But yeah, as a user, this is annoying. So it's a judgment call on which thing we optimize for. I think the proposal makes sense.

Questions

  1. What about NaN and Infinity? The most straight-forward way of implementing the above and resolving https://github.com/KittyCAD/modeling-app/issues/1130 would be to represent numbers with f64. This means that floating point NaN and Infinity arrive. Are we okay with this? If so, should we add parsing them as literals? An isNan() function?
nrc commented 2 weeks ago

In general, anything that's implicit is something that could bite us later when we want to change things because we don't know whether the user wanted the implicit behavior or not. And this seems like a correctness thing, so I'm not sure that the general approach of relaxing this later would be safe. It seems that users would start relying on the 0.1 behavior as essentially an assertion.

Yeah, I think there is a definitely a trade-off and there is a downside, but I think it is worth it. I'd be very open to use a different epsilon. I just pulled 0.1 out of thin air. We should do a little experiment to see whether 0.01 or 0.001 (or other numbers) would work and give better results.

I also feel like this should be as simple (and performant) as possible

Agree, I'm not worried about the current proposal - a few fp and int ops and an easily predictable branch should have very little impact on performance. I also hope that anywhere we do a lot of array accesses or similar, we can easily hoist the conversion outside of the loop in the Rust code.

If this is the pseudocode of the implementation, floating point is very often not intuitive

We'd want to use 0.99 rather than 1 or something (lol, even the epsilon needs an epsilon). I don't think we need anything fancier than that. But generally the whole point of this is to hide the users from the horrors of floating point.

What about NaN and Infinity?

I think we try to error out as early as possible, I'd really like to avoid surfacing this to the user if we can avoid it (including isNan). I think this is going to be a rough edge, but hopefully a small one, and we've got a rough plan to fix it, it's just not the highest priority.