What are the limitations of wasm-ctor-eval?

TyOverby commented 1 month ago

I'm trying to understand what type of code I should expect to be fully evaluatable by wasm-ctor-eval, and which instructions it won't be able to evaluate. "Calls to imported functions" are the example of an unevaluatable instruction used in the readme, but are there other constructs that it cant evaluate (or values that it can't serialize back into the wasm file?)

If this information is already documented somewhere, then I apologize - I couldn't find it... If it just hasn't been written down though, I'd be happy to contribute docs after I learn enough to do so!

kripken commented 1 month ago

Off the top of my head, one missing feature is tracking table updates (table.set), and there are probably a few other specific TODOs we haven't gotten around to. But in general it should be able to eval anything that can be seen at compile time, even complex things like recursive GC values.

If you run into a limitation, please file an issue.

TyOverby commented 1 month ago

Thanks!

I started playing around with wasm-ctor-eval and found that the first thing I hit was calling an imported function for performing pointer equality testing, and I suspect that if I got far enough, I'll be missing the math imports.

Wasm_of_ocaml has a pretty long module initialization phase at startup, and it would be very useful if wasm-ctor-eval could be used to eliminate much of it. Do you think that there's a future where more imported functions can be simulated inside of wasm-ctor-eval, like what was done for environment variables, stdin, and command line parameters?

kripken commented 1 month ago

@TyOverby Interesting question! Maybe we can find a good way to do that. I'd lean towards something modular, maybe using dynamic linking of native plugins, or loading wasm modules and using the internal wasm interpreter we already have here. (For use cases like yours where the imports are JS, something like QuickJS could be used, compiled natively or into wasm.)

That should work well for math functions, but for pointer equality testing it would require deeper integration, which might be complex. For that, it seems like a less-modular approach of adding code in binaryen itself could make sense. I'm not necessarily opposed to that, if we can find a modular way to do it.

But, can't you do ref.eq inside the wasm, for pointer equality? (I'm not familiar with your compiler, sorry.)

TyOverby commented 1 month ago

But, can't you do ref.eq inside the wasm, for pointer equality? (I'm not familiar with your compiler, sorry.)

I'm trying to remember why that doesn't work for us; maybe it has something to do with GC functions? Are they comparable?

kripken commented 1 month ago

Ah, right, functions are the exception. You can compare struct and array references, but not function references.

I guess you do need function reference equality? If so, it might be more efficient to box function references in tiny structs, where there is a 1:1 mapping between the wrapper structs and the functions. Equality checks are then just equality checks on the wrapper structs, and calling the functions costs just an extra struct load. The overhead of going through JS would be massive in comparison to that (especially since it will create JS wrappers around the wasm functions).

vouillon commented 1 month ago

We have to deal with a very large code base, written by many people. In this code base, the OCaml equality may be used to compare JavaScript objects, which was working fine when compiling to JavaScript. Since we are boxing JavaScript, the physical equality ref.eq will typically return false when comparing them even when the JavaScript strict equally would return true. If we were using ref.eq, the code may thus contain some bugs that are hard to track, since they do not result in failures at compile time, nor traps at runtime. So we are actually comparing values using ref.eq by default, but when we have two boxed JavaScript objects, we call a JavaScript function (x,y)=>x===y.

I'm a bit surprise that this function is the first imported item encountered, though. Where would the JavaScript objects come from?

kripken commented 1 month ago

Can you not ensure a 1:1 mapping of boxes to JS objects? One way is to keep a reference on the JS object to the wasm box, so that you never create another box for it (that is, a "make box" function would check if there is already a box for that JS object, and use it if so).

Once you have a 1:1 mapping then I don't see how this would be a problem:

the physical equality ref.eq will typically return false when comparing them even when the JavaScript strict equally would return true.

With 1:1 mapping, ref.eq would return true if and only if the two objects are the same.

vouillon commented 1 month ago

I'm not sure how we could implement this without leaking memory, since JavaScript weak maps do not work with primitive objects such as strings and numbers. Also, I don't know what the performance impact of using such as map would be. Boxing JavaScript is quite cheap. And in the common case, to implement the equality operator, we just add to type checks, which are fast.

kripken commented 1 month ago

I'm not sure how we could implement this without leaking memory, since JavaScript weak maps do not work with primitive objects such as strings and numbers.

To make sure we are on the same page, here is what I am imagining in more detail:

;; Type for a function wrapper.
(type $wrapper (struct (ref func)))

;; A wrapper for a function $foo.
(global $foo-wrapper (ref $wrapper) (struct.new $wrapper (ref.func $foo)))

;; Every place the compiler would normally emit `(ref.func $foo)` it instead emits this:
(global.get $foo-wrapper)

;; Every place the compiler would normally emit `(ref func)` it emits `(ref $wrapper)`

;; Comparison is then simple: we compare the 1:1 wrappers.
(func $compare-funcs (param $x (ref $wrapper)) (param $y (ref $wrapper)) (result i32)
  (ref.eq (local.get $x) (local.get $y))
)

;; Calling is a slightly slower, `(call_ref ..)` is replaced by
(call_ref (struct.get $wrapper 0 ..))

;; Helper for JS, wrap an arbitrary function
(func $wrap-js-func (export "wrap_js_func") (param $js (ref func)) (result $wrapper)
  (struct.new $wrapper (local.get $js))
)

And for JS,

function makeWrapper(func) {
  if (!func.wrapper || !func.wrapper.deref()) {
    // No existing wrapper: make a new one. By stashing it on the object, we will
    // always use the same wrapper for this JS object, allowing ref.eq in wasm to
    // work properly, as there is a 1:1 mapping of functions to wrappers. We use
    // a WeakRef so that we do not keep the wasm object alive unnecessarily
    // (though this means we may end up freeing it and creating it again later).
    func.wrapper = new WeakRef(wasm.exports.wrap_js_func(func));
  }
 return func.wrapper.deref();
}

I don't think this can leak?

WebAssembly / binaryen

What are the limitations of wasm-ctor-eval? #6964