faster-cpython / ideas

1.67k stars 49 forks source link

Tips from Stefan Marr #601

Open gvanrossum opened 1 year ago

gvanrossum commented 1 year ago

I wrote to Stefan Marr (@smarr) and he quickly wrote back with a bunch of useful tips. Below is mostly verbatim what he wrote. (The initial quoted line is me.)


Do you have any experience with this kind of environment? Do you have any pointers for us?

Yes. Not sure what you are imagining my reply to be. So, the brief and optimistic version is: you got a lot of “fun” ahead of you.

For the a little more elaborate answer:

There are the inline caches themselves. I believe getting the caches thread-safe is relatively straight forward and only requires a protocol that allows concurrent updates while allowing for benign races. Don’t have any specific pointers handy, but I think IBM had a paper on how they did it for JVMs in compiled code. In an interpreter, before you go to JITed code it is a little easier, I believe.

Though, this is, as you noted in your post, is only the very first issue. Indeed, if you allow free modification of objects and data structures, all kind of things can break and locks are not a good solution.

We did various bits of work on designing object models and approaches for dicts and lists that could be applied here. However, none of them are “simple solutions”.

For objects, we worked in a system that has Self-style maps/V8 style hidden classes.

Here the challenge is to find a way to make sure that nothing bad happens despite allowing certain races. Have a look at sec. 3 here https://stefan-marr.de/downloads/oopsla16-daloze-et-al-efficient-and-thread-safe-objects-for-dynamically-typed-languages.pdf There we discuss a bit the typical problems. In the following section, we sketch a bit how one can design a solution. While this was implemented on top of Java memory semantics, the general ideas should be applicable for CPython, too.

Then the next challenge are builtin collections. Here we took a variant of the storage strategies developed for PyPy, work by Carl Friedrich Bolz et al you might be familiar with, and moved it into a concurrent world. https://stefan-marr.de/downloads/oopsla18-daloze-et-al-parallelization-of-dynamic-languages-synchronizing-built-in-collections.pdf

Things get here a bit more complicated… The good thing is that many applications will first initialize a collection with data before it is shared. And then mostly read from it.

This gives us the option to avoid locking on many collections. Only when they become reachable by multiple threads AND are mutated in unsafe ways, one needs to change them to a storage strategy that synchronizes access.

Though, that’s then also only part of the solution.

Gal Sela and Erez Petrank did publish some work last year on how to get stronger guarantees for languages such as Python where developers expect GIL-like concurrency semantics: https://dl.acm.org/doi/10.1145/3563300

I’ll be visiting Erez later this year for two month to take this work a bit further, with an eye on possibly integrating it into a Ruby or Python.


In a follow-up message, Stefan wrote:


I should have probably also include that this work was done with Benoit Daloze (https://eregon.me/blog/research/, @eregon) And is written up in a more long form format in his PhD https://eregon.me/blog/assets/research/thesis-thread-safe-data-representations-in-dynamic-languages.pdf Some slides: https://eregon.me/blog/assets/research/thesis-slides.pdf

He is lead of the TruffleRuby project and has been working with the CRuby people to think about what a memory model for Ruby could look like.