chromium / subspace

A concept-centered standard library for C++20, enabling safer and more reliable products and a more modern feel for C++ code.; Also home of Subdoc the code-documentation generator.
https://suslib.cc
Apache License 2.0
88 stars 15 forks source link

Conversions between floats and integers #221

Closed danakj closed 9 months ago

danakj commented 1 year ago

https://doc.rust-lang.org/reference/expressions/operator-expr.html#type-cast-expressions

The as keyword in Rust will convert between types without panicing, and can introduce truncation.

We could provide 3_i32.as<u32>() or sus::as<u32>(3_i32) (the latter feels weird).

This is a lot like static_cast in C++ which is also a notorious cause of bugs... do we want to provide this in place of i32::from_unchecked(unsafe_fn, x) which already does a static_cast?

danakj commented 1 year ago

We already have implicit constructors for these types then the incoming type will not lose any data, so the as<T>() cast would be strictly for cases where data is lost. We also have T::from_unchecked(unsafe_fn, x) to do this without checking that truncation is happening.

So I think maybe the choice to be made here is if that should be considered an unsafe operation, or it's fine to lose data as long as it's explicit. Since no UB occurs, this probably means it's "dangerous" from a "your software is likely to have bugs" POV but not "unsafe" from a "the compiler will ruin your life and you will struggle to debug this in production" POV.

danakj commented 1 year ago

In this case, how to go from i64 x to i32 y:

i32 y = i32::from_unchecked(x);  // Unchecked usually comes with `unsafe_fn`...
i32 y = x.as<i32>();
i32 y = i32::from_lossy(x);
i32 y = x.cast<i32>();

noting that we already have Choice::as<Tag>() to get the active Tag member out of Choice.

danakj commented 1 year ago

An important sharp edge that's left unanswered right now is conversion between floats and integers.

We have no i32::from(f32) or similar, no i32::from_unchecked(f32), no f32::from(i32), etc.

We should look at what Rust defines for these conversions, because at best it's messy and there's tradeoffs to be made.

danakj commented 1 year ago

f32 implements From<u8> and From<u16> and From<i8> and From<i16> as these conversions are lossless. https://doc.rust-lang.org/std/primitive.f32.html#impl-From%3Ci16%3E-for-f32 https://doc.rust-lang.org/std/primitive.f32.html#impl-From%3Cu16%3E-for-f32 https://doc.rust-lang.org/std/primitive.f32.html#impl-From%3Ci8%3E-for-f32 https://doc.rust-lang.org/std/primitive.f32.html#impl-From%3Cu8%3E-for-f32

And it supports as which allows lossy truncation-casting between primitive types. https://doc.rust-lang.org/stable/reference/expressions/operator-expr.html#type-cast-expressions

Casting from integers to floats:

Casting from an integer to float will produce the closest possible float *

  • if necessary, rounding is according to roundTiesToEven mode ***

  • on overflow, infinity (of the same sign as the input) is produced

  • if integer-to-float casts with this rounding mode and overflow behavior are not supported natively by the hardware, these casts will likely be slower than expected.

*** as defined in IEEE 754-2008 §4.3.1: pick the nearest floating point number, preferring the one with an even least significant digit if exactly halfway between two floating point numbers.

Converting floats to integers:

Casting from a float to an integer will round the float towards zero

  • NaN will return 0
  • Values larger than the maximum integer value, including INFINITY, will saturate to the maximum value of the integer type.
  • Values smaller than the minimum integer value, including NEG_INFINITY, will saturate to the minimum value of the integer type.

C++ does not specify what happens when you convert from a primitive float to integer that can't be represented.

danakj commented 1 year ago

Currently the integer types are From and they panic if the value is out of range. However there's three problems with this.

  1. There's no good way to convert signs or truncate. i32::from(u64) will panic if it's out of range, uh, but how do we convert and truncate and just keep the bits? We can't bit_cast because they are different sizes.
  2. Panicking a lot is bad, but we now don't have to write the type we're converting into at call sites due to using From, you can just sus::into(my_i64) and it will try turn your i64 into an u8 and panic in the process. A clear truncation is better, with the destination type specified. This is what my_i64 as u8 achieves in Rust.
  3. The harsh mismatch with similar apis in Rust means people familiar in one expect the wrong thing in the other.

In Rust integers, From is only implemented when it can be done losslessly and without panic. We have the ctors for that already because it is important for conversion from primitives and literals, since they won't be type deduced to our numeric types.

danakj commented 1 year ago

Regarding integers, we should be informed by the PUN effort in chromium. When types are currently specified incorrectly, due to implicit conversions of primitive types, the fixes involved explicit casting (bit casting/truncation) but it was easy to do it wrong and produce bugs, while also being hard to convince people to just use checked_cast when u want to preserve the value and not the bit pattern.

Another thought is how might rust have done this if they did not add the as keyword. Through a trait but they dont use From/Into for truncating conversions. Maybe a BitFrom trait (avoiding the type deduction of BitInto?) would be appropriate and possibly useful in other contexts too which makes it sound nice.

u8::bit_from(-40_i64) // preserves bits u8::try_from(-40_i64) // preserves value

danakj commented 1 year ago

Or maybe AsBits concept like AsRef which has as_bits() thus explicit conversion to the target type.

FromBits could be the receiving side of AsBits then still with from_bits(x)

danakj commented 1 year ago

FromBits is requires to support casting from primitives, as they can't be extended with AsBits.

Then, if we have FromBits, we could provide sus::as_bits(x) that calls T::from_bits(x). But is there a point to that?

Ah, but if we want to cast back to primitives too, then we need sus::as_bits<unsigned long>(an_i32) or some such with an extension point to provide as_bits for things that are not themselves FromBits. This is a similar point to what I've been considering a while with Into, if into<Y>(x) should support things beyond From<decltype(x), Y>.

Actually, if we take it all outside of the class then one concept is sufficient for both directions. It could plausibly be named FromBits or AsBits, though I think the latter reads a bit more correctly with the template parameter. Either way the type being constructed is the type parameter.

// With from_bits.
unsigned int a = 2;
auto b = sus::from_bits<i16>(a);

// With as_bits.
unsigned int a = 2;
auto b = sus::as_bits<i16>(a);

The concept would be satisfied by a template specialization, like with sus::ops::Try and TryImpl, but with an as_bits() function more like into() but (intentionally) without type deduction:

template <class To, class From>
struct AsBitsImpl;

template <class To, class From>
concept AsBits = requires (const From& from) {
  { AsBitsImpl<To>::from_bits(from) } -> std::same_as<To>;
};

template <class To, class From>
To as_bits(const From& from) { return AsBitsImpl<To>::from_bits(from); }

Which allows restricting on types that be bit-converted into X:

i32 add_10(AsBits<i32> auto x) {
  return sus::as_bits<i32>(x) + 10;
}
danakj commented 1 year ago

Staring at naming some more and wanting to follow good guidelines (https://rust-lang.github.io/api-guidelines/naming.html) for consistency, AsBits should be ToBits, and it should be sus::to_bits.

danakj commented 1 year ago

Instead of a strict bit preservation between int and float, it would be better to follow the rules of https://github.com/chromium/subspace/issues/221#issuecomment-1529201960 assuming they are ~compatible with static_cast but with less UB.

Have defined float->integer casting to match them, with well defined behaviour for NAN and OOB values. For in-bound values it has the same defined behaviour as static_cast did.

integer->float looks.. harder =) for another day.

danakj commented 1 year ago

Also.. if it's not strict bit preservation then I want a better name than ToBits probably. But to_bits is nice and short, something that represents the lossyness of it all.

Current best idea: ToNear<T, F> and sus::to_near<T>(x);.

danakj commented 1 year ago

It's super non-trivial to match rust's behaviour in conversions from int to float. C++ leaves it as implementation defined behaviour for which direction to round toward a representable floating point value, while Rust follows IEEE specs better and has a defined strategy of roundTiesToEven.

But also it's probably problematic to do something other than what static_cast does anyway because a) it would be slower and b) it would make converting code from static_cast to something safer (no UB) suddenly get risky.

So to_bits() is now roughly defined as doing a static_cast but with defined behaviour for edge cases instead of UB.

danakj commented 1 year ago

This is all done now in https://github.com/chromium/subspace/pull/289 except we need a better name than ToBits and sus::to_bits<T>(x) since it's a general cast/conversion not a bitwise-cast.

Current candidates: Concept Function Notes
ToNear to_near<T>(x) More accurate for float than int conversions
ToLossy to_lossy<T>(x) Strong code review smell. Too strong? as isn't so uncommon in Rust
ToIsh to_ish<T>(x) Bad use of capital i which looks like lowercase L? But good memeing
CoerceTo coerce_to<T>(x) does it imply lossiness?
Coerce coerce<T>(x) shorter than coerce_to<T>(x) but less clear?
ToApprox to_approx<T>(x) Wordier version of "near"?
Transmogrify mog<T>(x) mog is not so discoverable, but good Calvin & Hobbes vibes
danakj commented 12 months ago

Not listed above is sus::cast<T>() which I think is what we should name what is currently sus::mog<T>(). It's going to ease the learning curve a lot to use the same name.

danakj commented 9 months ago

Other conversions that we're missing: