investigate JIT usage - Githubissues

Geal commented 6 years ago

the current version uses wasmi, which is a very nice interpreter, but we might want to JIT (or more precisely AOT) compile wasm modules to native code, for better performance.

Unsolved questions right now:

which JIT engine do we use? LLVM? Cretonne (paging @sunfishcode for this)?
how does wasm behaviour map to native code (see https://github.com/cretonne/cretonne/issues/144 for some thoughts):
- where does the stack go?
- how do we transform host function calls to native calls?
- what do we do with traps?
how do wasm's security guarantees hold in native code? Can it access arbitrary memory locations in the process?

Geal commented 6 years ago

a year ago, I was working on the idea of using virtualization instructions (like Intel VT-x) to create very small virtual machines, without an OS, that would just execute one function and return.

It turns out those virtualization instructions could map well to wasm: they require that we create a few memory maps for the memory and the code, create a context (ie fill registers), and provide abritrary interruptions to communicate with the host.

I think this would provide a good security umbrella for wasm in native mode.

sunfishcode commented 6 years ago

which JIT engine do we use? LLVM? Cretonne (paging @sunfishcode for this)?

There are projects using Cretonne to JIT WebAssembly on x86-64 right now, and it's complete enough to pass the entire WebAssembly testsuite and run real applications. It does take some work to embed it, though we're working on making it easier, and I'd be happy to help if you're interested in trying it.

how does wasm behaviour map to native code (see cretonne/cretonne#144 for some thoughts): where does the stack go? how do we transform host function calls to native calls? what do we do with traps?

Cretonne's answers to these questions are:

The wasm stack goes on the native stack, so wasm can call into and be called by functions outside its sandbox (if you choose to provide access).
Host function calls can be emitted directly.
You can implment the TrapSink trait and Cretonne will tell you the address of every instruction that is expected to trap, so if you install signal handlers for the traps, you can map the signal to an expected trap.

how do wasm's security guarantees hold in native code? Can it access arbitrary memory locations in the process?

WebAssembly linear memory is sandboxed when JITed to native code. Cretonne provides two options:

bounds checks, simple compare-and-branch sequences before loads and stores, to ensure that they are within the WebAssembly linear memory bounds
on 64-bit hosts, you can reserve a 4GiB region of virtual address space, and since WebAssembly linear-memory addresses are 32-bit, it's not possible for them to reach outside that range, so bounds checks on simple accesses aren't necessary (there's more to it, but this is the basic idea).

Cretonne does not currently provide any mitigations for Spectre.

Geal commented 6 years ago

ok! So, where should I start to support cretonne JIT in this project? Are there code examples I could follow?

sunfishcode commented 6 years ago

Here are some examples:

Cretonne has a new simplejit API which handles many low-level details. See here for a demo of it in action. I'm working on a blog post about this demo right now.
There's a prototype wasm execution engine here which is a basic demo of Cretonne's wasm support. It was written before simplejit was written, so it still does some of the complex stuff itself, but in theory it should be straightforward to rewrite it on top of simplejit.

There's more to say about stack overflow checks and indirect call sandboxing, but that's a start. Let me know if you have any questions!

sunfishcode commented 6 years ago

Another example is the wasm runtime in Nebulet.

sunfishcode commented 6 years ago

Have you had a chance to look into this yet? If not, no worries, but if so, I'd be interested in how it's gone.

Geal commented 6 years ago

I'm have not had much time these days, and I'm working on async networking first, but this is definitely the next feature I'm working on :)

Geal commented 6 years ago

ok, so now that I've had time to play with async networking, I have a much better ida of the runtime I want, and I can get to integrating cretonne :)

sunfishcode commented 6 years ago

@Geal Have you had a chance to play with cretonne yet? If so, I'm curious how it's gone :).

Geal commented 6 years ago

@sunfishcode I started playing with it: 1fab6e2fdb41d4bc23. I used the cretonne_wasm crate to parse the wasm files (my ModuleEnvironment implementation is just a copy of DummyEnvironment for now). From there, I'm a bit confused about the required steps, so correct me if I'm wrong:

the ModuleEnvironment got a list of Function that correspond to the wasm functions compiled to cretonne IR, so I apprently don't need to reimplement the translatemethod from simplejit: https://github.com/sunfishcode/simplejit-demo/blob/master/src/jit.rs#L135-L203
I need to create a Module<SimpleJITBackend> and Context. I have to call the module's declare_function https://github.com/sunfishcode/simplejit-demo/blob/master/src/jit.rs#L88-L90 with the signature I have in my ModuleEnvironment, then call define_function https://github.com/sunfishcode/simplejit-demo/blob/master/src/jit.rs#L97-L99 I can apparently create a Context from a write_data_funcaddr to indicate the addresses of my host functions? Function
should I use https://docs.rs/cretonne-module/0.8.0/cretonne_module/struct.Module.html#method.write_data_funcaddr
then I can call my function directly: https://github.com/sunfishcode/simplejit-demo/blob/master/src/toy.rs#L46

I don't really understand where I'm supposed to provide host functions. From what I understand, the ModuleEnvironment gets the list of imports from the wasm file, but I don't see how to match them to my local functions. Also, I rely on a patched version of wasmi to support pausing the interpreter. I do that with trapsso the interpreter stops immediately, then modify the stack to emulate a correct result when I jump back into the interpreter. Is it something that would be doable with that JIT implementation?

sunfishcode commented 6 years ago

Cool, I'll take a look at what you have soon!

Yes, the simplejit-demo is compiling a toy language, so it needs its own translation. cretonne-wasm performs translation for wasm.

And yeah, the infrastructure for supplying host functions isn't very advanced yet. I'll give more specific advice once I have a chance to look at your code.

If I understand your question about traps, the answer is yes: If JIT code traps for any reason (including an explicit trap instruction), you can handle it with a signal handler. And as long as the stack and register state is preserved (or saved and restored), you can jump back to it at any time.

sunfishcode commented 6 years ago

the ModuleEnvironment got a list of Function that correspond to the wasm functions compiled to cretonne IR, so I apprently don't need to reimplement the translatemethod from simplejit: https://github.com/sunfishcode/simplejit-demo/blob/master/src/jit.rs#L135-L203

That's right. cretonne-wasm can do the translation for you.

I need to create a Module and Context. I have to call the module's declare_function https://github.com/sunfishcode/simplejit-demo/blob/master/src/jit.rs#L88-L90 with the signature I have in my ModuleEnvironment, then call define_function https://github.com/sunfishcode/simplejit-demo/blob/master/src/jit.rs#L97-L99

That's right.

I can apparently create a Context from a write_data_funcaddr to indicate the addresses of my host functions? Function

write_data_funcaddr will arrange for the address of the specified function to be written into the data section.

should I use https://docs.rs/cretonne-module/0.8.0/cretonne_module/struct.Module.html#method.write_data_funcaddr then I can call my function directly: https://github.com/sunfishcode/simplejit-demo/blob/master/src/toy.rs#L46

Yeah, I don't actually know if there's a "best" way to do this yet. When you compile a function, simplejit will have a function pointer which can be called from Rust, and the only question is, what's the best way to give Rust a function pointer?

I don't really understand where I'm supposed to provide host functions. From what I understand, the ModuleEnvironment gets the list of imports from the wasm file, but I don't see how to match them to my local functions.

Since you're using simplejit, you can rely on the dlsym functionality. If you declare an Imported function, it should use dlsym (or the equivalent on Windows) to find it.

Geal / serverless-wasm

investigate JIT usage #9