extism / elixir-sdk

Extism Elixir Host SDK - easily run WebAssembly modules / plugins from Elixir applications
https://extism.org
BSD 3-Clause "New" or "Revised" License
30 stars 1 forks source link

Question: long-running WASM + Erlang scheduler #15

Open munjalpatel opened 3 months ago

munjalpatel commented 3 months ago

Hey @bhelx amazing work!

Erlang scheduler generally doesn't like long-running processes. For native Erlang / Elixir processes, it uses reductions to preempt so that other processes have fair chance of consuming CPU slice.

How would the scheduler react to WASM running through Rustler? What advise do you have when someone needs to run potentially long-running WASM modules?

bhelx commented 3 months ago

Thanks @munjalpatel! Could you link to any specific docs about what you are referring to? I haven't had this problem but maybe I'm misinterpreting your question.

munjalpatel commented 3 months ago

@bhelx here are some references:

Why no long running nifs: https://youtu.be/nw2eIB6bTxY?t=350 https://youtu.be/tBAM_N9qPno?t=2074

Here's the demo and potential solutions with Rustler https://youtu.be/BREqrlzfQUo?t=1078

Here's the general info on how the Erlang Scheduler works https://youtu.be/JvBT4XBdoUE?t=411 https://www.youtube.com/watch?v=tBAM_N9qPno

Quote from: https://blog.appsignal.com/2024/04/23/deep-diving-into-the-erlang-scheduler.html

To promote fairness among processes, Erlang's preemptive scheduling relies on reductions rather than time slices. If a process exhausts its allocated reductions, it can be preempted, even if its execution isn't complete. This approach prevents a single process from monopolizing the CPU for an extended period, fostering fairness among concurrent processes. By using reductions as the foundation for preemption, Erlang mitigates the risk of processes starving for CPU time. This design ensures that every process, irrespective of its workload, is periodically allowed to execute.

Essentially, when we run WASM as NIF, reduction count won't get updated. Hence, the scheduler will continue to give significantly more time to the process running the NIF. We have to somehow indicate to the scheduler the progress that's made inside NIF and represent it in terms of reductions as explained in the Rustler video ( https://youtu.be/BREqrlzfQUo?t=1078 ) @scrogson am I thinking about this correctly?

@tessi is handling it by running NIF in an OS thread ( https://github.com/tessi/wasmex/issues/6 and https://github.com/tessi/wasmex/pull/7 ). That obviously is a lot more heavy-weight than a NIF executing in the Erlang process itself -- but still better than having a blocking process.

I believe we have access to WASM's linear memory. I am not sure if there is a way to track instruction execution of a WASM module and run the module with an arbitrary instruction as a starting point. If there is, it might be possible to pause/resume a WASM module every 2ms while yielding 2000 reductions. This might be a very far-fetched idea though!! @tessi has much better idea here https://github.com/tessi/wasmex/issues/394

scrogson commented 3 months ago

@munjalpatel correct. Given that this library can't predict how long each WASM call will take to execute, it should at the very least make it into a DirtyCpu NIF so that it runs on the dirty schedulers.

bhelx commented 3 months ago

Thanks for all this info @munjalpatel and @scrogson ! I hadn't considered that, but it makes sense. Our bindings using rustler are fairly naive.

@munjalpatel were you interested in contributing this? If not i can start by looking into this "DirtyCpu NIF" that @scrogson mentioned.

Another option could be just re-writing the underlying code to use wasmex and getting rid of the custom nif. I have explored this but not started or done a proof of concept: https://github.com/extism/elixir-sdk/issues/3

munjalpatel commented 3 months ago

@bhelx happy to help -- I am quite familiar with Elixir but have never worked in Rust before. But can certainly figure things out with guidance :)

Utilizing DirtyCpu should be fairly straight forward. If I recall, its just an annotation on the exported function.

However, I would much rather use wasmex instead of going DirtyCpu route for the following reasons:

So by using wasmex, we will inherit all these + future improvements :)

bhelx commented 3 months ago

@munjalpatel that would be awesome! No pressure of course :) @tessi has done a great job on wasmex and is way ahead of my bindings. I only haven't switched over due to lack of time. If you come find me in our Discord in the #elixir-sdk channel. I can give you some real time advice or even do some pair programming if you feel like it would help.

munjalpatel commented 3 months ago

@munjalpatel that would be awesome! No pressure of course :) @tessi has done a great job on wasmex and is way ahead of my bindings. I only haven't switched over due to lack of time. If you come find me in our Discord in the #elixir-sdk channel. I can give you some real time advice or even do some pair programming if you feel like it would help.

A bit busy this week. But let's get some time on our calendars for the next week and we can figure out the plan ahead. What timezone are you in?

bhelx commented 3 months ago

No problem, I can always find a way to squeeze in some time. I'm in the US in Central Time. In terms of UTC, I'm generally active from 11:00 UTC to 01:00 UTC. Just ping me on Discord, same username as Github.

tessi commented 3 months ago

hey 👋 you mentioned me often enough to appear for a short comment :D

wasmex is pretty stable right now, but feature development is slow. Reason is mostly me focussing on family in my freetime (kids eat ones freetime for breakfast). Regarding wasmex, most work is in updating dependencies. We use wasmtime as the underlying wasm ecxecutor, which still sees some significant development and API changes. Usually good improvements, but it's still work to keep up. :) That being said, I'd love to see people migrate to wasmex and am happy to onboard additional contributors to it. together we have more time and dedication than us alone.

I only haven't switched over due to lack of time

I feel you! :D if we can find a way to share some work or save some time in maintaining this, I'm all in. but no pressure, if we stay separate it's also cool!

@munjalpatel thanks for your efforts! 💛

bhelx commented 3 months ago

@tessi thanks for the update on the project. No pressure for you to help of course. I think we should be able to mostly do it on our own. I think either way, we're faced with the decision to build all the stuff you have built ourselves, or join forces. The latter is the obvious choice.

Usually good improvements, but it's still work to keep up. :)

Yeah we are downstream of wasmtime for our standard fallback libextism dependency, so we know the project well!

That being said, I'd love to see people migrate to wasmex and am happy to onboard additional contributors to it. together we have more time and dedication than us alone.

Agreed, there's no reason you should need to shoulder all the burden. Happy to contribute to wasmex where we can if we can pull this off. Also perhaps we can attract some more contributors too to help you out. We have a few users of this library who might be able and willing to help out.