Closed skirpichev closed 1 year ago
I rarely use PyArg_ParseTupleAndKeywords due to the overhead. I'll try writing some custom argument parsing code. Even if it is only beneficial when keywords are not use, that should help the mode common use cases.
I rarely use PyArg_ParseTupleAndKeywords due to the overhead.
I will try to adapt PR, but keep in mind that the interface must be compatible with int's from/to_bytes. It's possible to do without (see Objects/clinic/longobject.c.h and Objects/longobject.c of the CPython sources tree) with much more complex code.
I did a quick test and it is not PyArg_ParseTupleAndKeywords. The PR code runs the benchmark in ~230ns. Hard-coding argument values to eliminate any parsing overhead reduces the time to ~180ns. But int.to_bytes
only takes ~75ns.
int.to_bytes
uses METH_FASTCALL. I wonder if that's why it so fast. I haven't experimented with METH_FASTCALL yet. My plan was to convert the _mpmath*
functions first.
To use METH_FASTCALL we should drop support for older CPython versions (< 3.7). Yep, we did.
I've tested v2.1.2 with this patch on the CPython v3.6 (before conversion of int's methods to use METH_FASTCALL):
$ python -m timeit -s 'from gmpy2 import fac' -s 'a = fac(57)' 'a.to_bytes(32)'
1000000 loops, best of 3: 0.878 usec per loop
$ python -m timeit -s 'from math import factorial' -s 'a = factorial(57)' 'a.to_bytes(32, "big")'
1000000 loops, best of 3: 0.798 usec per loop
Should fix #357. Early draft version, please don't merge unless absolutely sure.
Some benchmarks are here. But note that for this size integers - arguments parsing (PyArg_ParseTupleAndKeywords branch) take ~1/3 of time. Without this branch: