slot mechanism benchmark/ optimisation for dunder methods

benmkw commented 1 year ago

I re-read this post https://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/ and while I'm not deep enough in the details to have an opinion on the specifics, it seems like this simple benchmark is worth optimising for/ seems like a relevant issue for python still today:

(from the post)

# in x.py
class A(object):
    def __add__(self, other):
        return 42

$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
1000000 loops, best of 3: 0.256 usec per loop

$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)'
10000000 loops, best of 3: 0.158 usec per loop

reproduction on my system:

$ python3 --version
Python 3.11.2

$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
10000000 loops, best of 5: 35.1 nsec per loop

$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)'
10000000 loops, best of 5: 26.6 nsec per loop

so indeed 8 years after the post appeared this issue seems to still be relevant (although at a different scale)

kmod commented 1 year ago

We took a crack at this in Pyston with some pretty good results. One of the main reasons why these dunder-using features are slow is because they use uncached attribute lookups, which are slow. Pyston / modern CPython have attribute caches that the interpreter can use, but absent a more-powerful specialization framework (which we had in Pyston v1 but not in Pyston v2), we need to find another place to cache this, and in Pyston we cache some of the dunder attributes directly on the type object.

Cache definition Usage

Fidget-Spinner commented 1 year ago

In CPython we also cache dunder attribute lookups in the type object. However, __add__ isn't one of them. So far I think we only have __getitem__ and a few others.

markshannon commented 1 year ago

Specialization of binary operations is a work-in-progress, and adding specializations for user defined dunder methods is one possible improvement. There is a slight complication that it will take more inline cache space, so the small speedup for specializing for this case may be outweighed by the slowdown caused by bigger code objects.

@brandtbucher is this on your to do list?

faster-cpython / ideas

slot mechanism benchmark/ optimisation for dunder methods #555