faster-cpython / ideas

1.67k stars 49 forks source link

Paper from Montreal #626

Open gvanrossum opened 9 months ago

gvanrossum commented 9 months ago

I received this in email while on vacation.

Hello Guido. We have never met but we share a common interest in programming languages. I am the director of the programming languages lab at the Université de Montréal, and my research is on optimizing compilers for dynamic languages, with a recent project targeting Python. Given that you are working on improving the performance of CPython, I think you may be interested in the following paper that we will present at SLE@SPLASH next month: “An Executable Semantics for Faster Development of Optimizing Python Compilers”. Below is a link to a preprint of the paper. My co-authors are Olivier Melançon and Manuel Serrano (in CC).

http://www.iro.umontreal.ca/~feeley/papers/MelanconFeeleySerranoSLE23.pdf

In the conclusion we express the hope that the approach we present is used to improve the performance of Python implementations such as CPython. We would be interested in knowing your opinion.

Marc Feeley Professor Département d’informatique et de r.o. Université de Montréal

Fidget-Spinner commented 9 months ago

Cool work. It seems a lot of implementations converge on using some sort of DSL/descriptor language to define semantics of the language, and then partially evaluate to optimize things.

Also, a cool reference I saw in the paper: A paper on quantifying overhead in CPython 3.9. Paints a similar story to the statistics gathered here too: https://www.sciencedirect.com/science/article/abs/pii/S0167642321001520?via%3Dihub

gvanrossum commented 9 months ago

I haven't had time to read the whole paper yet, but does it at any point point out that Figure 1 (semantics of +) is incomplete? (There are some additional rules that come into play if the type of y is a subtype of the type of x, and another if they both have the same type.) It's easy to claim performance gains based on simplified semantics.

Fidget-Spinner commented 9 months ago

I read the parts I was interested in and skimmed the parts I wasn't so I can't claim to have read everything. However, from my memory and cursory check, no. ISTM this paper is based off a MSc paper by the first author. Looking at Appendix A for that paper it says "In this appendix, we list all the templates for generating semantics and magic methods." (Olivier Melançon, 2021), and defines the add operation similarly.

I think they made it clear that the implementation is a subset of Python though. In "Threats to Validity" they do say

Zipi only supports a subset of the language. It lacks features such as threads, async functions, and most of the standard library

I would assume non-fully-conformant semantics is in that list.

Also, I had no clue about the subtype behavior, and had to look up the CPython source code. TIL https://github.com/python/cpython/blob/main/Objects/abstract.c#L924 . Is this documented anywhere?

gvanrossum commented 9 months ago

Also, I had no clue about the subtype behavior, and had to look up the CPython source code. TIL https://github.com/python/cpython/blob/main/Objects/abstract.c#L924 . Is this documented anywhere?

That's fair, it's a pretty obscure thing. There's a note at https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types:

If the right operand's type is a subclass of the left operand's type and that subclass provides a different implementation of the reflected method for the operation, this method will be called before the left operand's non-reflected method. This behavior allows subclasses to override their ancestors' operations.

omelancon commented 8 months ago

Hi! I am one of the authors of the paper. I am happy to see it discussed here and we hope it can help improve CPython.

I haven't had time to read the whole paper yet, but does it at any point point out that Figure 1 (semantics of +) is incomplete? (There are some additional rules that come into play if the type of y is a subtype of the type of x, and another if they both have the same type.) It's easy to claim performance gains based on simplified semantics.

This is an unfortunate error on our part, which I realized after submission. The semantics of arithmetic operators is supposed to be fully conformant with CPython. However the error does not affect performance.

We automatically precompute fast paths for arithmetic operations and given combinations of built-in types (ex: addition of two ints). Then, instead of executing the whole semantics of an operator, we look at the type of both operands and dispatch the operation to the corresponding fast path.

Adding the missing subtype-checks yields the exact same fast paths, so performance is unaffected.

Please, let me know if you have any questions!