Closed jneb closed 2 years ago
I found out I did already update it to 3.10 (and forgot I did), so that saves some work: https://github.com/python/cpython/pull/24206
For completeness, here's the relevant issue: https://bugs.python.org/issue42911 It ends with: Thank you for the suggestion and PR, but I'm going to close here; for me this ends up on the wrong side of the performance / complexity trade-off. Since the performance / complexity line has changed, please check on which side we are now.
@MarkDickinson
I think you meant @mdickinson: https://github.com/mdickinson :)
Thanks for the ping, @jneb. My opinion hasn't changed since the https://github.com/python/cpython/issues/87077 discussion - for me, the extra complexity isn't worth it. This seems too niche an operation to be focusing on as part of the faster-cpython efforts, so I don't think that changes anything. I also unfortunately don't have the bandwidth to shepherd such a change through.
That said, this is really Tim Peters' code, so if you can interest @tim-one in the changes then that could be a path forward.
The code in 3.11 is different in several respects:
Much faster for very small exponents.
Part of which is because it doesn't initialize stack space to zeroes for a table of powers unless needed.
And if it is needed, the table has half as many entries (only odd powers).
Which in turn is a happy consequence of moving to a sliding-window algorithm.
None of those really increased the complexity of the code, though. Equivalent ways of doing much the same things, but more efficiently.
There's some potential in the code as-is for dynamically picking a maximum window size (the larger the exponent, the better a wider window "in theory"), but I didn't see any speedups for "sane" exponent sizes worth the extra bother (and so the max window size is fixed at 5 bits - but I didn't explore using smaller max window sizes for smaller exponents).
Attempts to go beyond this will likely be a hard sell. The code is already quite involved, and it's been tuned over a course of years. Any such attempt should redo timings, and, if they seem worth pursuing, presented as a GitHub pull request. But speedups under, say, 10%, probably aren't worth the bother unless they're trivial code changes.
So basically several improvements that I made (specifically the sliding window) are implemented now. Good to hear. To be honest, the rest of the optimization is quite marginal, as it involves the "midsized" expoonents. As most of the exponents in practice are either very small (which is fast already), or pretty large (for cryptography), there is not much of these in practice. So unless somebody comes with an example of a often used application that uses exponents of 10-30 digits, I agree that the extra effect of addition chains will not be much. So, even though it would be really cool to see some of my code in the Python code base :-), let's leave it where it is now.
I wrote a faster power routine for Python 3.8 that got rejected at the time with comment "too many lines of code, not enough speedup". But times have changed. See https://github.com/jneb/cpython/tree/38_fast_pow The idea is: currently, small powers are done with binary powering, and the large ones with a base 32 method. My method uses a method that uses base 2 up to 64 based on the size and bit patterns of the exponent. Is up to 15% faster in the "intermediate" powers, and about 2% for high powers. Of course, it probably needs adjusting for 3.11.