CeleritasCelery / rune

Rust VM for Emacs
GNU General Public License v3.0
468 stars 26 forks source link

Multi-threading #21

Open CeleritasCelery opened 1 year ago

CeleritasCelery commented 1 year ago

There is some discussion of my thoughts on multi-threaded Emacs here. Read that first. This project gives us a unique opportunity to make the interpreter thread safe. I am generally of two minds about how to handle multi-threading.

idea 1 - CSP all the way

CSP or "communicating sequential processes" is the most flexible way to implement parallelism. This is the approach used by languages like go and clojure. You essentially create first-class channels that can be passed around to other threads, and then only communicate through those channels. This let you build a topology of "communicating processes" that can sequence with each other as desired. It is very simple, flexible, and powerful.

However, you run into the issue that this method could not be made backwards compatible with existing GNU Emacs runtime. That runtime has no ability to suspend and resume threads of execution in the way that would be needed to mimic CSP. You could in theory do this with existing Emacs threading API, but is very unstable and will crash your Emacs, so it is not ready to be used. However if the threading library was ever made robust, then you could in theory model CSP in GNU Emacs.

Idea 2 - coroutine style

GNU Emacs supports stackless coroutines (called generators), which allow for basic concurrency. If you limited your multi-threading API, you could have it map to coroutines as a polyfill. This would mean you would not have to write code twice, once for multi-threading, and once for not. I imagine it looking something like this.

(let ((num 0)
      (go (goroutine (lambda (x) (yield (1+ x))))))
  (setq num (resume go num))
  (message "%d" num) ;; prints 1
  (setq num (resume go (+ num 2)))
  (message "%d" num)) ;; prints 4

This would limit thread to only communicating with the thread that spawned them, and only yielding values in the top level function (since Emacs coroutines are stackless). But you would not need an explicit channel type, and it would be easy to create polyfill version of goroutine, yield, and resume so that the same code would work in GNU Emacs and a multi-threaded Emacs. I don't like this solution as well as just using CSP. But maybe the backwards compatibility would be worth it.

why not use async/await for the backend?

I don’t think using Tokio or any other async await platform would be worthwhile. Fundamentally, they don’t offer any feature that you wouldn’t already get with threads. Plus async brings its own problems.

First is that you have to introduce function coloring into codebase. Everything needs to have two versions. Plus async code is messier and harder to reason about. We are not writing a web server, so we don’t need that complexity.

The only real advantage of async is lower resources usage. Green threads take significantly less space compared to system threads. But this is only really a concern when you want thousands of threads or are memory constrained.

The other advantage is that they can switch context a little faster because they pin threads to cores. However I see this as a disadvantage. In tokio you need to tell the runtime upfront if something is going to be compute heavy or IO heavy. But since the threads are in elisp, we can’t know that ahead of time. Many modern CPU’s have both performance cores and efficiency cores. The kernel will try to move compute heavy tasks to the p-cores and IO heavy tasks to the e-cores. But tokio can’t do that because the threads are pinned!

Also async only supports cooperative multitasking, which means a task that doesn’t play by the rules can starve others. But OS threads are preemptive, so that can’t happen.

What should be shared and what should be thread-local?

Things that are going to change only rarely such as functions can be shared between threads. However most other things like variable bindings and properties will be thread local. However to better support integration with existing elisp, thread local types will be copied to a thread as needed. This ensures that users don't have to worry about "is this variable defined in this thread or not", but still protects us from data races and other nasty concurrency bugs.

how to handle variable bindings?

As described in the previous section, the current idea is have variable bindings and symbol properites copied to other threads "on demand". Essentially when a thread first tries to access a variable that it does not have defined locally, it will send a message to the it's parent thread and ask for the current value via a channel. The parent thread will copy send a message back with the value, or message indicating that the variable is undefined. The child thread will wait until it receives a message back before continuing execution.

how to handle errors

When an error occurs in another thread, what do you do? Any channels connected to that thread would become poisoned, and would signal an error if you tried to read from them. But what about threads that don’t communicate with the main thread. Maybe you wanted to delete a file in the background or something. Raising it randomly back to the mail thread seems really disruptive. But it could also be a foot gun if it just silently ignores the error. Maybe it just prints it as s message.

Alan-Chen99 commented 1 year ago

async/ await is useless since we dont have two colors. Some sort of executors is still useful for small tasks like fortifying a buffer. One would ideally be able to have two thread local contexts (we wont expose the two contexts to the same function though; alternatively we can give contexts an additional 'id, not sure how much change that is (we are using macros after all))

CeleritasCelery commented 1 year ago

Currently we have one Context per thread. What advantage do you see in having multiple per thread?

Alan-Chen99 commented 1 year ago

async/ await is useless since we dont have two colors

actually I would like to take this statement back. Tramp says otherwise.

Apart from that, a lot of processing can benefit from a clean context. Take processing LSP returns for example; Its lots of cons cell that will just be immediately freed. A lot of processing have this pattern, it's mostly "pure".

I imagine this as we would provide an interface to run a function with arguments on another thread. The arguments would be cloned. The result of the function would be cloned back to the main thread, and it would call a callback. The user will have to specify the color. With this variables won't be shared at all (I don't think it makes sense to). It might be worthwhile to provide channels and shared buffer primitives similar to js webworkers.

I don't think "But since the threads are in elisp, we can’t know that ahead of time" is a valid argument. Take https://github.com/progfolio/elpaca for example; Elpaca pretty much rolls its own event loop with subprocesses. It would be nice if packages don't need to have custom logic to handle async

CeleritasCelery commented 1 year ago

I don't think "But since the threads are in elisp, we can’t know that ahead of time" is a valid argument.

That is specifically referring to tokio. It expects you to set if a task is compute heavy or IO heavy before it is run. But we can't really know that for an elisp task unless we expose that choice to the end user.

Alan-Chen99 commented 1 year ago

This is already exposed to the end user: they must make sure to use https://github.com/jwiegley/emacs-async for compute heavy and timer.el (or something that wraps that) for IO heavy. I linked elpaca as an example, since it needs to do both and imo with tokio as backend this can be much cleaner.

imo we should give elisp the power to choose, not deciding anything for them.

CeleritasCelery commented 1 year ago

imo we should give elisp the power to choose, not deciding anything for them.

it doesn't have to be a choice between giving them the power to chose and deciding for them. You only have that dichotomy with async/await. If you instead use native threads then you can let the OS move the task automatically to p-cores or e-cores depending on the workload. You can't do with async (at least with Tokio) because of core-pinning. That is one of the reasons I think native threads are better.