At the recommendation of a mutual friend I contacted Laurie Tratt (@ltratt), who wrote back with some useful tips. The rest of this comment is his message verbatim.
[...] I have thought a bit about some of the
problems I think you're thinking about. A warning, though: in general, I
think we've explored relatively little of the language implementation space;
and often I find that people confuse "I did X" with "X is the best/only way
of doing things". I'm probably not immune from this either!
My summary of the problem is: in a JIT we want to make assumptions about an
object and optimise based on those assumptions; but in a multi-threaded
environment another thread may quickly invalidate our assumptions. [I'll use
the term "JIT", but interpret that liberally: it doesn't have to mean
"generate machine code"; it could mean "quickening" etc. etc.]
I tend to think that there are two main approaches to mitigating this:
Typically, most objects are created and die within a single thread.
A JIT can often prove by "escape analysis" [1,2,3] that an object
can't leak to another thread. Then a) no locking on that object is
necessary b) the JIT knows that no other thread can invalidate its
assumptions about that object.
Escape analysis is more useful than is sometimes presumed: often a JIT
can prove that an object can't leak up to a certain point in the JITed
code. Rather than just assume the object has to always be treated with
multi-threaded caution, the JIT can optimise it up to the escape point
as single-threaded, then materialise it as a "full" object and treat
it as a multi-threaded object from then on.
Clever/fast locking strategies. Typically, even objects that are
accessible in multiple threads are not mutated by those threads
simultaneously. If you can acquire/release locks quickly in the
(common) uncontended case, you win.
HotSpot (aka "the JVM") has gone through several iterations to make
locks fast (the most famous is "biased locking" but that was removed
recently-ish [4]): I think they currently have a two-stage strategy
("lightweight" locks can get upgraded to "heavyweight" locks).
JavaScriptCore (the JavaScript VM in WebKit/Safari) developed "parking
lot" locks [5] which only take two bits and are very good for the
uncontended case (better than OS locks!), and tolerable for the
contended case. I tend to use this approach these days.
I don't pretend to know the precise details of the system you're working
on, and thus whether these approaches are definitely what you want. But
I think there is a decent chance that if the no-gil work goes ahead,
these sorts of factors could prove at least interesting to you.
If any of this sounds interesting, I'd be happy to discuss further by email
or video. In terms of West Coast friendly times, I've got availability
tomorrow or Monday up to 20:00 BST (midday PDT), for example, if you
need to discuss things sooner rather than later.
[Not directly related to this email, but a disclaimer: my group is doing
research on automatically creating tracing JITs from C interpreters, so that
we can JIT into C extensions. CPython is, unsurprisingly, one that we're very
interested in! It's mid-stage work, at the moment, and we might have a demo
out later this year. This would complement, rather than replace, other JIT
work, so I don't think we're stepping on your toes or vice versa!]
At the recommendation of a mutual friend I contacted Laurie Tratt (@ltratt), who wrote back with some useful tips. The rest of this comment is his message verbatim.
[...] I have thought a bit about some of the problems I think you're thinking about. A warning, though: in general, I think we've explored relatively little of the language implementation space; and often I find that people confuse "I did X" with "X is the best/only way of doing things". I'm probably not immune from this either!
My summary of the problem is: in a JIT we want to make assumptions about an object and optimise based on those assumptions; but in a multi-threaded environment another thread may quickly invalidate our assumptions. [I'll use the term "JIT", but interpret that liberally: it doesn't have to mean "generate machine code"; it could mean "quickening" etc. etc.]
I tend to think that there are two main approaches to mitigating this:
Typically, most objects are created and die within a single thread. A JIT can often prove by "escape analysis" [1,2,3] that an object can't leak to another thread. Then a) no locking on that object is necessary b) the JIT knows that no other thread can invalidate its assumptions about that object.
Escape analysis is more useful than is sometimes presumed: often a JIT can prove that an object can't leak up to a certain point in the JITed code. Rather than just assume the object has to always be treated with multi-threaded caution, the JIT can optimise it up to the escape point as single-threaded, then materialise it as a "full" object and treat it as a multi-threaded object from then on.
Clever/fast locking strategies. Typically, even objects that are accessible in multiple threads are not mutated by those threads simultaneously. If you can acquire/release locks quickly in the (common) uncontended case, you win.
HotSpot (aka "the JVM") has gone through several iterations to make locks fast (the most famous is "biased locking" but that was removed recently-ish [4]): I think they currently have a two-stage strategy ("lightweight" locks can get upgraded to "heavyweight" locks).
JavaScriptCore (the JavaScript VM in WebKit/Safari) developed "parking lot" locks [5] which only take two bits and are very good for the uncontended case (better than OS locks!), and tolerable for the contended case. I tend to use this approach these days.
I don't pretend to know the precise details of the system you're working on, and thus whether these approaches are definitely what you want. But I think there is a decent chance that if the no-gil work goes ahead, these sorts of factors could prove at least interesting to you.
If any of this sounds interesting, I'd be happy to discuss further by email or video. In terms of West Coast friendly times, I've got availability tomorrow or Monday up to 20:00 BST (midday PDT), for example, if you need to discuss things sooner rather than later.
[Not directly related to this email, but a disclaimer: my group is doing research on automatically creating tracing JITs from C interpreters, so that we can JIT into C extensions. CPython is, unsurprisingly, one that we're very interested in! It's mid-stage work, at the moment, and we might have a demo out later this year. This would complement, rather than replace, other JIT work, so I don't think we're stepping on your toes or vice versa!]
Laurie
[1] https://en.wikipedia.org/wiki/Escape_analysis [2] https://www.pypy.org/posts/2010/09/escape-analysis-in-pypys-jit-1780048403046080197.html [3] https://kipp.ly/blog/escape-analysis/ [4] https://openjdk.org/jeps/374 [5] https://webkit.org/blog/6161/locking-in-webkit/