boostorg / multiprecision

Boost.Multiprecision
Boost Software License 1.0
194 stars 111 forks source link

Speed of division with remainder for cpp_int #537

Open afabri opened 1 year ago

afabri commented 1 year ago

In a geometric code I observe a slowdown of a factor 4, when I switch from the gmp to the cpp_int backend. In vtune I see that more than 35% of the time are spent here. The comment at the signature leads me to the question if I am in the scenario where this function is not optimized. Is there a trivial fix? The comment seems to indicate that it is optimized for small numbers of limbs.

afabri commented 1 year ago

In my case the numerator has 78000 digits and the denominator 30000.

ckormanyos commented 1 year ago

It might be possible to low-word the shift on the carry. But I am really not sure if that exact sequence is using full multiples of the number of bits in a limb for the shift.

As a general comment, it's really hard to beat or match the raw speed of GMP when using portable code. They squeeze the assembler for their well-deserved, legendary reputation for bare-metal speed.

jzmaddock commented 1 year ago

I'm not surprised by this. Division is hard, so many corner cases to handle correctly. There are better algorithms (in the big-O sense), but it's a significant piece of work to implement them correctly.

ckormanyos commented 1 year ago

Division is hard, so many corner cases to handle correctly

I actually have a Knuth long division in an unrelated project. I've run hundreds of millions of tests on it and squeezed all the bugs out over the course of, my gosh, years. I use an opposite limb-ordering.

I feel that Knuth long division is the winner for limb-counts below 2,000 or so. But at that range of your test case, you're in the quadratic domain of classic algorithms. You'd probably be better off inverting-and-multiplying with conversion to bin-float, then Newton-Raphson-ing it. But to assemble the result, you'd be desperately driven to make sure you rorund correctly. Oh my gosh, I get exahusted just thingking about verifying and testing that code for hard-world-open use.

I wouldn't take on either of these tasks at the moment. But maybe note this as a potential place for optimization in the future... As these are really daunting tasks.

ckormanyos commented 1 year ago

actually have a Knuth long division

Since I've never spoken to, really anyone about this one and it's a gem, ...

Here is that Knuth long division - in it's full-on, loving, 300 lines of glory (albeit with the wrong limb-ordering) for future reference if anyone ever cares for it.