faster-cpython / ideas

1.67k stars 49 forks source link

PGO-ed JIT-ted startup #682

Open Fidget-Spinner opened 1 month ago

Fidget-Spinner commented 1 month ago

So this came from a discussion I had with @tekknolagi and Brandt at PyCon US. It depends on arbitrary-length superinstructions.

The main idea is that startup runs a lot of Python. There are two orthogonal ways to speed up startup: reduce work done at startup, or speed up Python. Ideally we should do both. In the spirit on whacky ideas, I will suggest a moonshot idea to significantly speed up Python only at startup:

Assuming startup code is mostly static, apart from fetching system locale, codecs, encoding, etc. During build time, we collect the traces formed by the JIT only at startup. We then pass the entire trace as a single stencil to clang to compile (still respecting the tree structure of course). During runtime, every startup will thus find the new "startup superinstructions" and the jitted code will be extremely efficient. There's main reason why this will be significantly faster than turning on the JIT is that the entire trace becomes a single instruction, allowing clang to perform whole-of-trace optimizations.

This is somewhat similar to Stefan Brunthaler's multi level quickening paper where there is sort of "PGO" but for benchmarks. However, since benchmarks are not a reliable example of real-world code, this just limits it to startup.

brandtbucher commented 3 weeks ago

Look, I love Futamura projections as much as the next compiler engineer, but... I think that an idea like this probably needs at least a proof-of-concept to proceed much further. Things that jump out to me as potential issues that will need to be tackled early on:

That's not even counting the wrinkles raising and catching exceptions, performing calls through C code to more Python code, etc. Likely it may make more sense to just add some more reasonably-sized-but-maybe-a-little-longer superinstructions that don't require deep surgery on the tier two instruction format itself. This seems quite a bit easier to experiment with and more likely to succeed.