akubera / bigdecimal-rs

Arbitrary precision decimal crate for Rust
Other
277 stars 71 forks source link

sqrt is slower than expected #134

Open jorisguex opened 1 month ago

jorisguex commented 1 month ago

I was benchmarking the sqrt function and I noticed that is was slower than I expected:

fn main() {
    let val = bigdecimal::BigDecimal::new(2.into(), 0);
    for prec in [100, 1000, 10000, 100000] {
        let ctx = bigdecimal::Context::default().with_precision(prec.try_into().unwrap());
        let start = std::time::Instant::now();
        let _ = val.sqrt_with_context(&ctx);
        let elapsed = start.elapsed().as_secs_f32();
        println!("Precision = {prec}: {:.5} seconds ({} microseconds per digit)", elapsed, elapsed*1000000.0/prec as f32);
    }
}
> cargo run --release 2>/dev/null
Precision = 100: 0.00047 seconds (4.67834 microseconds per digit)
Precision = 1000: 0.00512 seconds (5.124834 microseconds per digit)
Precision = 10000: 0.45796 seconds (45.79616 microseconds per digit)
Precision = 100000: 50.74217 seconds (507.4217 microseconds per digit)

Compare that to Python:

import decimal
import time
for prec in [10, 100, 1000, 10000, 100000]:
    decimal.getcontext().prec = prec
    start = time.perf_counter()
    decimal.Decimal(2).sqrt()
    elapsed = time.perf_counter() - start
    print(f"Precision = {prec}: {elapsed:f} seconds ({elapsed/prec*1000000:f} microseconds per digit)")
> python3 sqrt.py
Precision = 10: 0.000009 seconds (0.866700 microseconds per digit)
Precision = 100: 0.000015 seconds (0.154170 microseconds per digit)
Precision = 1000: 0.000644 seconds (0.644208 microseconds per digit)
Precision = 10000: 0.074025 seconds (7.402471 microseconds per digit)
Precision = 100000: 0.809923 seconds (8.099229 microseconds per digit)
akubera commented 1 month ago

I don't think that will change until I rewrite the "backend" digit code with a custom BigInt, rather than using num-bigint::BigInt, which doesn't offer much in terms of direct access to the digits, so there's lots of cloning involved.

If you're interested you could make a flamegraph of the calls involved and maybe there's some low hanging fruit that I can pick -- but I have doubts, and wont have much time to dedicate until later this month.

jorisguex commented 1 month ago

Hi, thanks for your response. I have done a flamegraph and ~28% of the samples are in num_bigint::biguint::multiplication::scalar_mul while ~72% are in num_bigint::biguint::division::div_rem_ref.

flamegraph

For anyone interested, I have made a fork in the meantime that speeds up sqrt by ~100x: https://github.com/akubera/bigdecimal-rs/compare/trunk...jorisguex:bigdecimal-rs:make-sqrt-faster. It passes all of the unit tests but there may be specific edge cases where it fails (I haven't done that much testing). It also has the benefit of rounding correctly.

> cargo run --release 2>/dev/null
Precision = 100: 0.00032 seconds (3.1970801 microseconds per digit)
Precision = 1000: 0.00056 seconds (0.55525 microseconds per digit)
Precision = 10000: 0.00428 seconds (0.4282583 microseconds per digit)
Precision = 100000: 0.49238 seconds (4.923846 microseconds per digit)