DLR-FT / wasm-interpreter

A minimal in-place WebAssembly interpreter, written in Rust, almost without use of external dependencies
https://dlr-ft.github.io/wasm-interpreter/main/
Apache License 2.0
9 stars 4 forks source link

The never-ending Linker problem #83

Open george-cosma opened 2 months ago

george-cosma commented 2 months ago

The Linker Problem

To fully implement a working wasm interpreter we must be able to resolve imports. The idea of a Linker comes to mind. There are multiple ways to design it.

Design 1: Monolithic Runtime Instance

linker_problem_monolith drawio

This design would entail collecting all validation info from all of the modules into the Linker, which will then produce a "merged" validation info which can then be instantiated as a sole RuntimeInstance. Import resolution would then be resolved universally, since a call to an imported function would actually be a regular call for the merged validation info.

✅ Pros:

❌Cons:

Design 2: Each validation info has its own runtime instance (Swarm)

linker_problem_swarm drawio

As the name and image suggests, each module gets its own runtime instance. Where the magic lies is actually inside the Linker, which, this time, is an entity which lives as long as the runtimes. When a module needs to call an imported function, it does so via the Linker.

✅ Pros:

❌Cons:

With this type of linker, there is an arhitectural problem we need to solve to maintain resumability. If Module 1 calls an imported function in Module 2, and Module 2 then calls an imported function from Module 1, at the end of this chain Module 1 must be able to resume code properly.

Here is as example how an import call could work:

linker_problem_rel1 drawio

Now, how it would work in the described scenario:

linker_problem_rel2 drawio

Notice that the second Store PC would overwrite the previously stored program counter. An intuitive solution would be to make it a stack, but that feels like it would create more problems than it solves. An alternative solution would have the call instruction create a callframe not only on the caller module, but also on the called module. Or something like that. There are solutions, but I do not know which is the correct one.

I'd like to continue this discussion. I want to know your opinions in regards with which approach to go with. I personally believe the second "swarm" approach is more appropriate, but that is based more on vibes.

george-cosma commented 1 month ago

I've made some dummy benchmarks to see how much slower approach 2 ("Swarm") would be: https://github.com/george-cosma/indirection_bench

george-cosma commented 1 month ago

Proposed API changes:

const ADD_ONE: &'static str = r#" (module (func (export "add_one") (param $x i32) (result i32) local.get $x i32.const 1 i32.add ) )"#;

use wasm::{validate, RuntimeInstance, DEFAULT_MODULE};

fn main() { let wasm_bytes = wat::parse_str(ADD_ONE).unwrap(); let validation_info = validate(&wasm_bytes).unwrap(); let mut instance = RuntimeInstance::new(&validation_info).unwrap();

// `get_fn` will verify that the function "add_one" exists for module <DEFAULT_MODULE>.
// On success: return the identifier pair (module_name, function_name, module_id, function_id)
// On failure: RuntimeError -- couldn't find the function
let add_one = instance.get_fn(DEFAULT_MODULE, "add_one").unwarp();

// Also, to maintain compatability with index-based accessing (which can be useful in some edge cases, and for us it
// is useful for integration tests):
let add_one = instance.get_fn_idx(/* module_idx: */0, /* function_idx: */0).unwarp();
// On success: return the identifier pair (module_name, function_name, module_id, function_id)
// On failure: RuntimeError -- couldn't find the function

// `invoke` will verify that the function identifier is still valid (it wasn't created with an instance and ran on
// another). That is why we also store the module_name and function_name.
assert_eq!(12, instance.invoke(&add_one, 11).unwrap());
// Or should we do it this way? Or both?
assert_eq!(12, add_one.invoke(&instance, 11).unwrap());

}


- Multiple modules example:
```rust
// .--------------------------.
// | Multiple modules example |
// '--------------------------'

const ADD_ONE: &'static str = /* as above */;
const ADD_TWO: &'static str = r#"
(module
    (import "add_one_module" "add_one" (func %add_one (param i32) (result i32)))
    (func (export "add_two") (param $x i32) (result i32)
        local.get $x
        call %add_one
        call %add_one
    )
)"#;

fn main() {
    let wasm_bytes = wat::parse_str(ADD_ONE).unwrap();
    let validation_info = validate(&wasm_bytes).unwrap();
    let mut instance = RuntimeInstance::new_named("add_one_module", &validation_info).unwrap();

    let wasm_bytes = wat::parse_str(ADD_TWO).unwrap();
    let validation_info = validate(&wasm_bytes).unwrap();
    instance.add_module("add_two_module", &validation_info).unwarp();

    let add_two = instance.get_fn("add_two_module", "add_two").unwarp();
    // Alternative:
    let add_two = instance.get_fn_idx(1, 0).unwarp();

    assert_eq!(13, instance.invoke(&add_two, 11).unwrap());
    // Or should we do it this way? Or both?
    assert_eq!(13, add_two.invoke(&instance, 11).unwrap());
}