WebAssembly / threads

Threads and Atomics in WebAssembly
https://webassembly.github.io/threads/
Other
705 stars 50 forks source link

Native WebAssembly threads #8

Open binji opened 7 years ago

binji commented 7 years ago

The current proposal has no mechanism for creating or joining threads, relying on the embedder to provide these operations. This allows for simple integration with SharedArrayBuffer, where a dedicated worker is used to create a new thread.

There is a drawback for WebAssembly, as a worker is a relatively heavyweight construct, and a native thread would be considerably more lean.

We are confident we'll want support for "native" WebAssembly threads eventually, so the question is whether the current proposal is sufficient for the initial implementation.

lars-t-hansen commented 7 years ago

If we have threads then we also need a thread representation. We can make do with integer handles or other semi-transparent types; the question is really whether we would be better off waiting with threads until we have better object representations. (Assuming we can afford to wait that long.) And if we do wait for objects the question is whether code cross-compiled from eg C++ can use those fancy representations. I sense a substantial topic here.

rossberg commented 7 years ago

I don't think you want forgeable thread ids. Depending on what primitives are available for them, that would preclude secure abstractions.

lukewagner commented 7 years ago

An idea I was proposing earlier was that we:

  1. create a new WebAssembly.Thread constructor/JS object type/definition type, allowing Threads to be defined/imported like all the other definition types.
  2. allow "thread" to be an allowed elem_type of a WebAssembly.Table
  3. allow >1 tables (as separately proposed)
  4. add a grow_table operator symmetric to grow_memory
  5. the new create_thread operator would have a static table-index immediate (validated to refer to a Table<thread>) and a dynamic i32 index operand and the newly-created WebAssembly.Thread would be stored into tbl[index]
  6. join_thread and any other thread operation that wants to take a thread would similarly take a static table-index immediate and dynamic i32 index operand to identify the table

And this gives you the unforgeable thread-id that is also not tied to any particluar instance and thus compatible with dynamic linking.

rossberg commented 7 years ago

Extending tables to support threads sounds good, though I'm a bit uneasy about tying threads to tables. What would be the forward compatibility story? When we add a more flexible opaque thread type eventually, would we add another create_thread instruction?

I'm wondering whether it wouldn't make sense to extend Wasm's type system with the notion of opaque types independent of GC types. Or would thread ids need to be GC'ed?

AndrewScheidecker commented 7 years ago

When we add a more flexible opaque thread type eventually, would we add another create_thread instruction?

I think you could do this in a forward compatible way by adding an elem_type of any_ref instead of thread. An implementation would need to track the type of elements in the table, but you wouldn't need a way to represent the type in the binary format.

rossberg commented 7 years ago

@AndrewScheidecker, I am wondering more about the instruction set. With a more general thread type potentially added later it would be silly to still require a table just to create a thread.

AndrewScheidecker commented 7 years ago

I am wondering more about the instruction set. With a more general thread type potentially added later it would be silly to still require a table just to create a thread.

I see what you mean, and it would be unfortunate if we ended up with table and non-table versions of the thread operators. However, I do think in almost all cases you will want to put the thread in a table, because that seems like the best way to get a "handle" to store in linear memory.

rossberg commented 7 years ago

@AndrewScheidecker, a program using a future GC extension, for example, probably has little reason to even define a linear memory.

lukewagner commented 7 years ago

@rossberg-chromium That's a good question and symmetric to the earlier question we had of whether, instead of call_indirect, we should have (call_func_ptr (get_elem (... index))). I think the important constraint here is "Don't introduce dependencies on GC for other features (e.g., using resources through tables)." which leads to the question of: can we have some opaque thread-id value type (returned by create_thread, taken by set_elem) that by construction never needs GC under any circumstance.

rossberg commented 7 years ago

@lukewagner, right, and I agree with the constraint, so I was wondering if it is or is not an issue for thread ids. Off-hand I don't see a need for GCing thread ids, but maybe there is some possible thread feature (present or future) I am overlooking that might necessitate it?

lukewagner commented 7 years ago

@rossberg-chromium Oh, I get what you're saying. Even if we can immediately free the OS thread when the startfunc returns (or the thread traps or if we add a thread_terminate etc), I think we'll have to keep around some bookkeeping datum as long as there are any extant thread-id values for impl reasons:

rossberg commented 7 years ago

@lukewagner, the first item doesn't seem tied to thread ids, does it? The data would be needed regardless of whether somebody retains an id. So I think it would be owned by the thread, and be freed when the thread terminates.

The second item seems more tricky. So far I was assuming that we'd simply use the underlying thread ids of the OS, but yeah, those may be reused too early.

lukewagner commented 7 years ago

@rossberg-chromium For the first item, if the thread terminates and there is no other thread waiting on a thread_join, the join-value needs to stick around until thread_join is called (or presumably get GC'd if there are no live thread-id references).

We also need to define what happens if thread_join is called multiple times for a single thread-id. POSIX says it's undefined but we'd probably say trap. Either way, this will requiring keeping the thread-id reserved until all references are dropped so that we can do the right thing.

binji commented 7 years ago

Side comment for those who may be following the issues but didn't read the CG meeting notes:

We agreed that native threads are not required for the v1 thread proposal.

aardappel commented 7 years ago

I think native threads are important not just for speed, but for convenience, and for non-web uses.

With the current proposal, setting up a threaded application requires a fair bit of JS glue, that has knowledge of the threading requirements of the module. What if I am compiling a bunch of C++ that may or may not spin up threads internally to do work, and I have no idea of them? Ideally in the future this could compile to a single module that would "just work". Even more so in a non-web embedding that does not use JS for dynamic linking / instantiation code.

binji commented 7 years ago

With the current proposal, setting up a threaded application requires a fair bit of JS glue...

Agreed, but reusing workers is a simpler target than adding native threads, and allows us reuse functionality we already have (SharedArrayBuffer, Atomics, Worker).

For a non-web host, I think you could do something like:

(module
  (import "host" "spawn" (func $spawn (param $func_id) ...))
  ...
  (func $worker ...)  ;; function id 22
  (func
    ...
    ;; spawn a new thread, calling worker
    (call $spawn (i32.const 22))
)

Agreed it's not as satisfying as having native wasm threads, though.

aardappel commented 7 years ago

@binji sure, the host can always take care of it, but I would hope that in the future I can compile C++ code that uses std::thread deep down in a library, and that the resulting wasm can be loaded equally in various embeddings.

binary132 commented 6 years ago

I think the population of people who are interested in WebASM iff it trivially supports direct compilation from portable C or C++, with minimal glue, is significant. Personally, if I need to make non-trivial platform-specific code in order to target WebASM, I simply won't target WebASM.

Edit: Sounds like this concern is not on point. Thanks for clarifying, @jayphelps!

jayphelps commented 6 years ago

You absolutely can transparently use pthreads or std::thread in C++ using emscripten. 🎉 It’s possible because how it is implemented is separate from wasm itself, so emscripten abstracts the fact that it uses Workers. Although, some browsers haven’t reenabled SharedArrayBuffer yet which is required as an implementation detail.

https://kripken.github.io/emscripten-site/docs/porting/pthreads.html

erestor commented 4 years ago

Unfortunately the emscripten implementation falls apart quickly when you try to use std::async for mini-tasks (say one second) following one another quickly lots of times, say hundreds. You'll end up with dozens of leaking workers and it's incredibly slow. (Although that can be mitigated with a custom thread pool replacing std::async.) Not to mention the fact that Firefox is essentially burying SharedArrayBuffer one piece at a time under layers of pseudo-security. Why can't it just work like PNaCl used to? Sigh :(

Pauan commented 4 years ago

@erestor Not to mention the fact that Firefox is essentially burying SharedArrayBuffer one piece at a time under layers of pseudo-security.

If you're referring to COOP and COEP, those are WHATWG standards which all browsers will implement, not just Firefox. You can read more about that here and here.

They're not pseudo-security: the point of COOP and COEP is that the browser will run the tab in a separate process, so that way Spectre attacks won't be possible.

PNaCl was designed long before Spectre was known about, but PNaCl also would have needed to change because of Spectre.

erestor commented 4 years ago

Yes, that's exactly what I'm referring to and I tend to disagree, but that's a discussion for elsewhere. The point is we don't even need SharedArrayBuffer if threads are transparent and running internally in the WebAssembly module. We need it for fast computation, not messaging in and out. For parallel tasks which need frequent serialization the current model is way off the promised performant web. (Grunt: Our product started with NPAPI, then switched to PNaCl, then had to move to WebAssembly and it's not been a happy ride.)

Pauan commented 4 years ago

@erestor The point is we don't even need SharedArrayBuffer if threads are transparent and running internally in the WebAssembly module.

I suggest you do some reading on Spectre, so you can understand why COOP and COEP are necessary. But the short version is:

  1. If you have multithreading you can use it to create a very high precision timer.

  2. That high precision timer allows for your code to bypass the same-origin policy and read sensitive data that it isn't supposed to.

It's a very huge security violation, which allows a malicious website to read data from other tabs, thus allowing them to steal things like credit card information, passwords, etc.

Any sort of multithreading or high precision timer will enable the Spectre exploit. That's why in addition to banning SharedArrayBuffer, all browsers also severely reduced the precision of performance.now().

So if WebAssembly native threads existed, they would also need to follow COOP and COEP. And PNaCl would also need to follow COOP and COEP. It's not about messaging or SharedArrayBuffer, it's about any sort of multithreading.

I don't think you realize how big of a deal Spectre is: it completely changes everything. Not only does it require major changes to CPU hardware, but it also requires programs like browsers to change the way that they handle tab processes. Because Spectre now exists, the old ways of doing things simply will not work.

erestor commented 4 years ago

That's a very good explanation and I thank you for it. I have read quite a bit about the cache timing exploits duo when they came out. But, in the browser context, is it true then that if I have only one tab open I'm safe? It would be great to be able to tell our customers that, instead of telling them "sorry, our product doesn't work anymore because of a browser security update". But again, this doesn't belong here. Sorry.

Pauan commented 4 years ago

But, in the browser context, is it true then that if I have only one tab open I'm safe?

Maybe. Though Spectre also potentially allows a website to access sensitive browser data, so maybe not.

And in any case, even if you only have one tab open the browser will still ban all multithreading. The only way to enable multithreading is for your server to send the COOP and COEP headers.

It would be great to be able to tell our customers that, instead of telling them "sorry, our product doesn't work anymore because of a browser security update".

You can just change your server to send the COOP and COEP headers. You can change them right now, you don't need to wait. That way your code will keep working and won't be broken by a browser update.

If you don't have control over the servers, you'll need to explain to your customers that they need to add the headers to their server. You should tell them now, so that way their code won't be broken later. And you should give a short explanation why this change is needed.

erestor commented 4 years ago

Thank you, you've been very helpful. We'll certainly try the server headers to the rescue. At least I know I shouldn't hold my breath on WebAssembly behaving like the PNaCl of old.

binji commented 4 years ago

And PNaCl would also need to follow COOP and COEP. It's not about messaging or SharedArrayBuffer, it's about any sort of multithreading.

I'm not sure about that -- PNaCl is converted to NaCl, which has to run entirely in its own process for its sandboxing model. That comes with other downsides, of course.