WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.41k stars 694 forks source link

Standardized JIT? #1501

Open moonheart08 opened 5 years ago

moonheart08 commented 5 years ago

Has there been any work on a standard way for modules to generate new WASM functions and tables, and then use them via references?

aardappel commented 5 years ago

There is no such thing yet, though there have been quite a few people hinting at this being a useful thing to have.

The big hurdle to overcome with making this a Wasm feature is that it closely interacts with how engines compile Wasm and organize/optimize code, so will be tricky to agree on a standard. For example it may break any process/optimization that uses the assumption that we can know the full set of functions or callers/callees ahead of time.

The simplest way I can imagine this could be made to work is by means of an "extension wasm module", i.e. a module that only contains what is new (e.g. a single function in a code section), and makes use of existing indices in the module that it extends (for everything it references). This could then be passes to the engine as a buffer of bytes (e.g. via the JS API) and called thru call_indirect.

tlively commented 5 years ago

Once we have typed function references it would be neat if a JIT feature could use them. I'm picturing an instruction that takes a memory address and size and returns a function reference.

rossberg commented 5 years ago

Yeah, such functions will need to access some module instance, in order to be able to access other functions or memories etc. That either means that one actually needs to create a first-class module and be able to link that from within Wasm, or that the function would have to be injected into an existing module instance, something which existing engines are not designed for, allowing which may prevent optimisations, and would break encapsulation if not designed very carefully.

titzer commented 4 years ago

I generally agree that such a JIT mechanism will need a way to specify new functions that have access to a module's internals. We definitely don't want to break the existing good encapsulation that we have Wasm modules today. A mechanism that solely relies on the existing import/export mechanism would force modules to export too much, IMO. The module linking proposal is going down the route of defining modules and instances as manipulable values and allowing nested modules. That is a candidate route forward but it's not clear yet.

DemiMarie commented 2 years ago

One question here is the amount of control a module will have over the modules it creates. For instance, module X might want to be able to read and write arbitrarily within module Y’s linear memory, preempt module Y if it uses too much CPU time, and be notified if module Y traps. Combined with memory protection, that might actually be sufficient to port Linux to the browser!

titzer commented 2 years ago

I don't see a JIT capability necessarily being dependent on function references. One could as well have an API where the result of JITing is inserted into a table at a given index, or appended to a (growable) table.

I think a (function-at-a-time) JIT capability that is callable from within a module is almost always in a scenario where the new code would need access to the internals of the calling module. In that scenario, a function would have the ability to address all the internal index spaces (tables, globals, functions, etc) in its bytecode. AFAICT this does not preclude most engine optimizations, but it does preclude toolchain optimizers that assume a closed world within a module. From a security perspective, it is functionally no different from a module having an internal IR and associated interpreter that can run said IR; it's just orders of magnitude faster.

However, a function-at-a-time JIT capability that is callable outside a module should only have access to a module's exports.

rossberg commented 2 years ago

In that scenario, a function would have the ability to address all the internal index spaces (tables, globals, functions, etc) in its bytecode.

Taken literally, I'm not convinced that's desirable. It would boil down to dynamic scoping, everywhere, with respect to a very fragile scoping mechanism (indexing), and across multiple compiler universes. That's nasty on many levels. A more robust solution would use an explicit mechanism for defining the environment mapping under which a JIT operation runs. That would decouple index spaces and thereby implementation details of the compilers involved, and it maintains the correctness of code transformations.