Stack scanning using Asyncify

kripken commented 5 years ago

I realized that Asyncify could be used for conservative stack scanning, since it rewrites the code to allow it to pause and resume. When it's paused, all the local state has been spilled to memory (and a specific well-defined region of it), so it can be scanned there. Doing this should be as easy as adding a function like emscripten_sleep that not just sleeps but also has a callback which would be used to call the compiled VM's scanning code.

This would have some overhead, but it could be low if there are few places that can trigger a GC (not inside inner loops etc; Asyncify has a whole-program analysis to find out what it can avoid instrumenting). In particular this should be more efficient than the SpillPointers pass which spills around every single call (Asyncify would only add logic around calls that need it, and only spill if a GC actually happens).

If this would be useful for people maybe we could add an API for it, which as mentioned above should be easy.

cc @wmaddox @vargaz , this might interest you.

kripken commented 5 years ago

Proposed implementation in emscripten, emscripten_scan_registers: https://github.com/emscripten-core/emscripten/pull/9121

wmaddox commented 5 years ago

Interesting. This looks very similar to the "lazy pointer stacks" scheme described here: http://www.filpizlo.com/papers/baker-ccpe09-accurate.pdf In their work, only unwinding is accomplished via code transformation, and rewinding is avoided by capturing a copy of the original stack, though they also discuss a scheme that uses rewinding to allow the GC to modify pointers, e.g. for a moving GC. I'm not clear on how much the extra instrumentation needed for rewinding in Asyncify will cost, but it's not difficult to imagine that it is cheaper than the current spill-everything approach taken by --spill-pointers. Note that it is typical to poll for interrupts at loop back edges as well as function calls, making them GC safepoints as well. The polling code in our language runtime handles this by making a call on the interrupt-taken path, so that the backend need only consider calls, but the effect is to magnify the number of calls seen by the backend relative to the C/C++ source code. Offhand, however, I believe that all interrupt sources in our runtime (outside of an interactive debugging session) immediately throw a built-in exception resulting in a longjmp to a handler, meaning that we could likely avoid any need to perform GC in the stack context in which the interrupt is signaled. We'll immediately unwind the stack to the frame in which the handler is located (TRY/CATCH construct in our language) and rely on state having been saved on entry to the scope of the exception handler, which could be demarcated with a call.

kripken commented 5 years ago

Interesting, thanks @wmaddox! Yeah, I'm curious about the overhead comparison too, if you try it and measure please let me know.

kripken commented 5 years ago

cc @titzer - the asyncify approach may be interesting to you as an option for stack scanning in virgil.

WebAssembly / binaryen

Stack scanning using Asyncify #2268