regression: `BigDecimal::from_str` gives unexpected result

comphead commented 7 months ago

Please find a small test case which reflects the concern

    #[tokio::test] async fn testx() -> Result<()> {
    use std::str::FromStr;

    let bd = BigDecimal::from_str(&10f64.to_string()).unwrap().normalized().to_string();
    assert_eq!(bd, "10");
    Ok(())
    }

in 0.4.1 it works as expected

in 0.4.3 BigDecimal::from_str creates BigDecimal("10e0") instead of BigDecimal("10") and now test failed because output is in exp format

assertion `left == right` failed
  left: "1E+1"
 right: "10"

akubera commented 7 months ago

Ok, I'll look into this.

akubera commented 7 months ago

The call to "normalized" removes all trailing zeros, so the 10 (structurally {10, scale=0}) is reduced to {1, scale=-1} and that is reflected in the printed output. Normalized loses precision, and printing is designed not to add extra precision by default, unless requested in a format string.

Version 0.4.3 changed the formatting in a way many people didn't like, so I'm changing formatting again for 0.4.4. We can look at solutions for this once that is out.

akubera commented 7 months ago

I'm coming around to the idea of sane defaults meaning minimizing surprise and just print numbers that makes the most sense to humans. If user wants it to accurately express significant figures, that's what the scientific notation is for.

The question then shifts to: what's sensible?

I'm already using the compile-time environment variable RUST_BIGDECIMAL_EXPONENTIAL_FORMAT_THRESHOLD to set the maximum number of leading zeros (default 5) preferring 0.000001 to 1e-6, but 1e-7 to 0.0000001.

I could use it or make another such that numbers with exponentials over threshold jump to exponential form.

Looks like this number is 15 for Python.

>>> 1000000000000000.
1000000000000000.0
>>> 10000000000000000.
1e+16

comphead commented 7 months ago

sounds good, what will happen if the number is without floating point? lets say I have an array

[10f64, 10.1f64] what is the expected output?

akubera commented 7 months ago

I'm not sure what you mean by array. I don't think there's a way to do formatting for arrays and vecs automatically.

But let's print some running sums:

let strs = ["10", "10.1", "100.100"];
let mut x = BigDecimal::from(0);
for s in strs {
  x += BigDeicmal::from_str(s).unwrap();
  println!("{x}");
}

I'd expect to print

10
20.1
120.200

and same if the numbers were in scientific notation: let strs = ["1e1", "1.01e1", "1.00100e2"];

comphead commented 7 months ago

Thanks @akubera that looks great, looking forward for the release

akubera commented 7 months ago

Made some examples of formats Decimal on left, f64 on right, formatted with format string on top

{}
 10^-20 : 1E-20           0.00000000000000000001
 10^-19 : 1E-19           0.0000000000000000001
 10^-18 : 1E-18           0.000000000000000001
 10^-17 : 1E-17           0.00000000000000001
 10^-16 : 1E-16           0.0000000000000001
 10^-15 : 1E-15           0.000000000000001
 10^-14 : 1E-14           0.00000000000001
 10^-13 : 1E-13           0.0000000000001
 10^-12 : 1E-12           0.000000000001 
 10^-11 : 1E-11           0.00000000001  
 10^-10 : 1E-10           0.0000000001   
  10^-9 : 1E-9            0.000000001    
  10^-8 : 1E-8            0.00000001     
  10^-7 : 1E-7            0.0000001      
  10^-6 : 0.000001        0.000001       
  10^-5 : 0.00001         0.00001        
  10^-4 : 0.0001          0.0001         
  10^-3 : 0.001           0.001          
  10^-2 : 0.01            0.01           
  10^-1 : 0.1             0.1            
   10^0 : 1               1              
   10^1 : 1E+1            10             
   10^2 : 1E+2            100            
   10^3 : 1E+3            1000           
   10^4 : 1E+4            10000          
   10^5 : 1E+5            100000         
   10^6 : 1E+6            1000000        
   10^7 : 1E+7            10000000       
   10^8 : 1E+8            100000000      
   10^9 : 1E+9            1000000000     
  10^10 : 1E+10           10000000000    
  10^11 : 1E+11           100000000000   
  10^12 : 1E+12           1000000000000  
  10^13 : 1E+13           10000000000000 
  10^14 : 1E+14           100000000000000
  10^15 : 1E+15           1000000000000000
  10^16 : 1E+16           10000000000000000
  10^17 : 1E+17           100000000000000000
  10^18 : 1E+18           1000000000000000000
  10^19 : 1E+19           10000000000000000000
  10^20 : 1E+20           100000000000000000000

{}:
12345×
 10^-10 : 0.0000012345    0.0000012345   
  10^-9 : 0.000012345     0.000012345000000000001
  10^-8 : 0.00012345      0.00012345     
  10^-7 : 0.0012345       0.0012345      
  10^-6 : 0.012345        0.012345       
  10^-5 : 0.12345         0.12345        
  10^-4 : 1.2345          1.2345000000000002
  10^-3 : 12.345          12.345         
  10^-2 : 123.45          123.45         
  10^-1 : 1234.5          1234.5         
   10^0 : 12345           12345          
   10^1 : 1.2345E+5       123450         
   10^2 : 1.2345E+6       1234500        
   10^3 : 1.2345E+7       12345000       
   10^4 : 1.2345E+8       123450000      
   10^5 : 1.2345E+9       1234500000     
   10^6 : 1.2345E+10      12345000000    
   10^7 : 1.2345E+11      123450000000   
   10^8 : 1.2345E+12      1234500000000  
   10^9 : 1.2345E+13      12345000000000 
  10^10 : 1.2345E+14      123450000000000

{:e}:
12345×
 10^-10 : 1.2345e-6       1.2345e-6      
  10^-9 : 1.2345e-5       1.2345000000000001e-5
  10^-8 : 1.2345e-4       1.2345e-4      
  10^-7 : 1.2345e-3       1.2345e-3      
  10^-6 : 1.2345e-2       1.2345e-2      
  10^-5 : 1.2345e-1       1.2345e-1      
  10^-4 : 1.2345          1.2345000000000002e0
  10^-3 : 1.2345e+1       1.2345e1       
  10^-2 : 1.2345e+2       1.2345e2       
  10^-1 : 1.2345e+3       1.2345e3       
   10^0 : 1.2345e+4       1.2345e4       
   10^1 : 1.2345e+5       1.2345e5       
   10^2 : 1.2345e+6       1.2345e6       
   10^3 : 1.2345e+7       1.2345e7       
   10^4 : 1.2345e+8       1.2345e8       
   10^5 : 1.2345e+9       1.2345e9       
   10^6 : 1.2345e+10      1.2345e10      
   10^7 : 1.2345e+11      1.2345e11      
   10^8 : 1.2345e+12      1.2345e12      
   10^9 : 1.2345e+13      1.2345e13      
  10^10 : 1.2345e+14      1.2345e14

{:.3}:
12345×
 10^-10 : 0.000           0.000          
  10^-9 : 0.000           0.000          
  10^-8 : 0.000           0.000          
  10^-7 : 0.001           0.001          
  10^-6 : 0.012           0.012          
  10^-5 : 0.123           0.123          
  10^-4 : 1.234           1.235          
  10^-3 : 12.345          12.345         
  10^-2 : 123.450         123.450        
  10^-1 : 1234.500        1234.500       
   10^0 : 12345.000       12345.000      
   10^1 : 1.234E+5        123450.000     
   10^2 : 1.234E+6        1234500.000    
   10^3 : 1.234E+7        12345000.000   
   10^4 : 1.234E+8        123450000.000  
   10^5 : 1.234E+9        1234500000.000 
   10^6 : 1.234E+10       12345000000.000
   10^7 : 1.234E+11       123450000000.000
   10^8 : 1.234E+12       1234500000000.000
   10^9 : 1.234E+13       12345000000000.000
  10^10 : 1.234E+14       123450000000000.000

There's a little more work to do.

comphead commented 6 months ago

@akubera any updates ?

akubera commented 5 months ago

Finally released. There were a couple of other edge cases regarding formatting large numbers, but I think it's all worked out now.

akubera / bigdecimal-rs

regression: `BigDecimal::from_str` gives unexpected result #127