different behavior of `to_f32` and `to_f64`

Shoeboxam commented 2 weeks ago

I just wanted to check if it is intentional to have to_f32 respect the rounding mode of self, but to_f64 always round to HalfEven. I think the API would be less surprising if these behaved the same way, and more useful if to_f64 were adjusted to use the same rounding mode as self.

Thanks as always!

Shoeboxam commented 2 weeks ago

On further testing, it seems to_f32 doesn't actually respect the rounding mode of self:

let min = FBig::<Up>::try_from(f32::from_bits(1))?;
let half: FBig::<Up> = min / 2;
println!("{}", half);
// 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
println!("{}", half.to_f32().value());
// 0

Aside from the documentation fix, is it possible to convert back to native types with controlled rounding? I guess I could increment/decrement the resulting float based on the reported rounding mode.

cmpute commented 2 weeks ago

The intended effect of to_f32 and to_f64 is to respect its own rounding mode (not always HalfEven). If not, then it's a bug. The example you provided is kind of the corner case, where the float is the least subnormal float

cmpute commented 2 weeks ago

Ensuring rounding works correctly for subnormals is challenging, could you provide some more test cases? I can take a look, but I can't assure a timeline to fix it..

Shoeboxam commented 2 weeks ago

No problem! The software I'm writing tends to seek out and find these edge-cases: lots of binary searches for extreme points. This is good for dashu, because you're getting more test coverage indirectly, but it's a bit challenging for us (although still better than getting reliable MPFR builds on windows platforms). In this case, the incorrect rounding causes another computation that should return infinity to instead return ~9. This is where it under-flows and triggers the incorrect subnormal behavior.

The implementations themselves are different: the f64 method makes weaker promises by only rounding to HalfEven, so it isn't "wrong". The f32 method makes more useful promises, but doesn't always uphold them, at least in this case.

#[test]
fn test_subnormal() {
    let min = FBig::<Up>::try_from(f32::from_bits(1)).unwrap();
    let half: FBig::<Up> = min / 2;
    assert!(half.to_f32().value() > 0.);
}

Note that in the above test, the output is Approximation::Inexact(0.0, NoOp). I'm not really sure how to interpret what no-op means. I'd rather always know if the in-exactness errs up or down.

Don't get me wrong, this library is incredibly useful! I provide bug reports whenever I encounter issues because I know it will help make dashu more robust. I understand you're probably busy with other things as well.

cmpute / dashu

different behavior of `to_f32` and `to_f64` #53