Open gvanrossum opened 9 months ago
Cool work. It seems a lot of implementations converge on using some sort of DSL/descriptor language to define semantics of the language, and then partially evaluate to optimize things.
Also, a cool reference I saw in the paper: A paper on quantifying overhead in CPython 3.9. Paints a similar story to the statistics gathered here too: https://www.sciencedirect.com/science/article/abs/pii/S0167642321001520?via%3Dihub
I haven't had time to read the whole paper yet, but does it at any point point out that Figure 1 (semantics of +
) is incomplete? (There are some additional rules that come into play if the type of y
is a subtype of the type of x
, and another if they both have the same type.) It's easy to claim performance gains based on simplified semantics.
I read the parts I was interested in and skimmed the parts I wasn't so I can't claim to have read everything. However, from my memory and cursory check, no. ISTM this paper is based off a MSc paper by the first author. Looking at Appendix A for that paper it says "In this appendix, we list all the templates for generating semantics and magic methods." (Olivier Melançon, 2021), and defines the add operation similarly.
I think they made it clear that the implementation is a subset of Python though. In "Threats to Validity" they do say
Zipi only supports a subset of the language. It lacks features such as threads, async functions, and most of the standard library
I would assume non-fully-conformant semantics is in that list.
Also, I had no clue about the subtype behavior, and had to look up the CPython source code. TIL https://github.com/python/cpython/blob/main/Objects/abstract.c#L924 . Is this documented anywhere?
Also, I had no clue about the subtype behavior, and had to look up the CPython source code. TIL https://github.com/python/cpython/blob/main/Objects/abstract.c#L924 . Is this documented anywhere?
That's fair, it's a pretty obscure thing. There's a note at https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types:
If the right operand's type is a subclass of the left operand's type and that subclass provides a different implementation of the reflected method for the operation, this method will be called before the left operand's non-reflected method. This behavior allows subclasses to override their ancestors' operations.
Hi! I am one of the authors of the paper. I am happy to see it discussed here and we hope it can help improve CPython.
I haven't had time to read the whole paper yet, but does it at any point point out that Figure 1 (semantics of
+
) is incomplete? (There are some additional rules that come into play if the type ofy
is a subtype of the type ofx
, and another if they both have the same type.) It's easy to claim performance gains based on simplified semantics.
This is an unfortunate error on our part, which I realized after submission. The semantics of arithmetic operators is supposed to be fully conformant with CPython. However the error does not affect performance.
We automatically precompute fast paths for arithmetic operations and given combinations of built-in types (ex: addition of two int
s). Then, instead of executing the whole semantics of an operator, we look at the type of both operands and dispatch the operation to the corresponding fast path.
Adding the missing subtype-checks yields the exact same fast paths, so performance is unaffected.
Please, let me know if you have any questions!
I received this in email while on vacation.