dgkf / R

An experimental reimagining of R
https://dgkf.github.io/R
GNU General Public License v3.0
136 stars 6 forks source link

Ensure consistency of operations with respect to `Inf`, `NA`, `NaN` #163

Open sebffischer opened 1 month ago

sebffischer commented 1 month ago

Inspired by a mastodon post about the behavior of

NA + NaN

And

NaN + NA

in R

dgkf commented 1 month ago

Thankfully, for now this is definitely not an issue as we currently don't support NaN. This was intentional, as I'd like to keep the number of esoteric values rather minimal. In the same spirit as https://github.com/dgkf/R/issues/106, I'd prefer that these eventually become vectors of type unions.

dgkf commented 1 month ago

This does open up a question for whether mathematical operator commutative-ness should be enforced by the language.

Enforcing that their commutative would mean that something like ggplot2, which (I think) has asymmetric operators, might need to slightly alter its API, but I don't think it would really deter any ggplot2-like tools.

That said, I think the ggplot2 API makes more sense with a |> anyways, so maybe this hinting intuition is a nudge that enforcing commutative math operators would encourage more intuitive APIs altogether.

sebffischer commented 1 month ago

Thankfully, for now this is definitely not an issue as we currently don't support NaN. This was intentional, as I'd like to keep the number of esoteric values rather minimal. In the same spirit as #106, I'd prefer that these eventually become vectors of type unions.

Even though we did not explicitly encode NaN in the language, it can still be the result of a mathematical operation. NaN is part of the floating point specification and is obtained by e.g. calculating 0 / 0. I think it is only available for floats, not for integers.

However, combining NaN with NAs seems to -- at least at first sight -- behave like we want it to:

> 0 / 0 + NA
[1] NA
> NA + 0 / 0
[1] NA
> 0 / 0
[1] NaN
> 

But I think this needs to be checked properly. We might need to pay some additional attention when coercing a Vector::Double to Vector::Integer. Because integers don't support NaN, as.integer(NaN) results in an NA_integer_ in R.

This does open up a question for whether mathematical operator commutative-ness should be enforced by the language.

I think I like it! If I remember correctly, someone from ggplot2 (maybe even Hadley) once said that they kind of regretted using + in ggplot2 because the operator is not commutative (but don't quote me on that).

dgkf commented 1 month ago

0 / 0 [1] NaN

:eyes:

How did that get in there! It's surely just the internal f64.. but I never realized it snuck in. I'd probably have to defer to someone who does more stats algorithm development than I do to learn how useful NaNs are. Personally, I never really require them, but I'm sure in modelling tools they're very handy.

It would be nice to reduce the different exotic values where possible.. but I can also see some value in holding on to it for /0 scenarios.

sebffischer commented 1 month ago

It would be nice to reduce the different exotic values where possible.. but I can also see some value in holding on to it for /0 scenarios.

In principle yes, but even if we wanted to we could not get rid of NaN without paying for it with a significant overhead. These special floating-point values are encoded in the CPU instructions. If at all, we should replace the floating-point NA with an NaN, as it should be more efficient.