chc4 / lineiform

A meta-JIT library for Rust interpreters
156 stars 4 forks source link

Async JIT compilation #12

Open chc4 opened 2 years ago

chc4 commented 2 years ago

Related to #11, but we can also do the JIT compilation in another thread, and run the slow code while it's compiling in the background. This means that, in the worse case where there's some hot loop that is called exactly as many times as our perf count threshold, we don't cause it to wait for a very long time for compilation only to never be called again.

We should use rayon for this, since we basically just want to post "compile me!" to a workqueue, and then check the progress on call and use the result if it's finished. Slightly complex since we have to make sure Drop doesn't leak the JIT code memory if we Drop the FastFn but the JIT compilation isn't done yet.

chc4 commented 2 years ago

We don't currently handle thread locals in JIT closures anyways, so just compiling on another thread is always fine for now.

ohAitch commented 2 years ago

Stretch goals: run the slow code and then switch midair to a fast single-pass inlining output and then if the Fn's still live switch again to the output of a more aggressive linker variant with half the LLVM passes pulled in… :^)

chc4 commented 2 years ago

I think you could do something neat with having Lineiform return a Pin closure, so that the async jit compilation actually just patches the function pointer in the closure vtable directly so it has 0 overhead to swap it out (potentially multiple times). There would be weirdness, though, since a closure we're compiling could call another closure we're compiling, and we want to make sure if we inline the second further optimizations also affect everything upstream. I guess if you require Pin you can just have a dependency graph of closures in the main Lineiform struct, and trigger recompilation of everything that refers to a closure once you have optimized one?

ohAitch commented 2 years ago

Is this a Pin, or just like, a Cell?

chc4 commented 2 years ago

I think you'd need Pin, since you'd want to hand out a pointer to the closure's vtable that it should atomically patch when it's done compiling; this means that the closure can't move while that pointer is alive, which is the canonical use for Pin. Cell doesn't prevent moves of the value.

You could also just forget about Pin entirely and poll for a compilation result at every FastFn::call invocation, which is simpler to implement (but a bit worse).