WebAssembly / interface-types

Other
641 stars 57 forks source link

Is it possible to share complicated JavaScript data structure directly with WebAssembly? #18

Closed Becavalier closed 5 years ago

Becavalier commented 5 years ago

I want to know is it possible to share complicated JavaScript data structure like "Object" or "Array" directly with WebAssembly context?

As I know, there will be a lot of overhead if we want to share a JavaScript array with WebAssembly, first we need to encode this array data and then pass it to WebAssembly context through WebAssembly.Memory. Second, the WebAssembly module(C/C++ side) also need to fetch those data again in the WebAssembly linear memory through the pointer of this array.

So I want to know is there a way to share this array entity which already generated and exist in V8 (or any other JavaScript engine) without re-constructing it in WebAssembly.Memory again?

Pauan commented 5 years ago

Yes, with the reference-types and host-bindings proposals, it is possible to pass a JS object reference directly to wasm (and also pass a JS object from wasm back to JS).

However, those proposals aren't fully standardized and implemented yet.

In the meantime, there is a workaround that you can use right now:

  1. When you want to pass a JS object to wasm, you create a new unique integer (this is the id of the object), and you put the JS object into a dictionary which maps from ids to objects.

    This id is basically the same as a pointer (but implemented manually with a dictionary).

  2. Then you pass that id (which is just an integer) into wasm.

  3. Wasm can then do whatever it wants with the id: pass it around, return it from functions, put it into locals, put it into linear memory, etc.

    In other words, from wasm's point of view, it's a proper first-class reference.

  4. When wasm wants to call a function or method on the JS object, it passes the id back into JS, and JS then looks up the id in the dictionary and then calls the function/method.

  5. When wasm is done using the JS object, it passes the id to JS, asking JS to free it.

    JS then removes the object from the dictionary (which allows the JS engine to garbage collect it, since the object is now unused).

This technique is used in wasm-bindgen and stdweb, and it works with all JS objects (it also works with JS primitive types as well, like strings).

Here is a concrete example:

main.wat:

(module
  (import "env" "makeNewArray" (func $makeNewArray (result i32)))
  (import "env" "push" (func $push (param i32) (param i32)))
  (import "env" "length" (func $length (param i32) (result i32)))
  (import "env" "free" (func $free (param i32)))
  (import "env" "logInt" (func $logInt (param i32)))
  (import "env" "logRef" (func $logRef (param i32)))
  (func (export "main")
    (local $array i32)
    (set_local $array (call $makeNewArray))
    (call $push (get_local $array) (i32.const 1))
    (call $push (get_local $array) (i32.const 1))
    (call $push (get_local $array) (i32.const 2))
    (call $push (get_local $array) (i32.const 3))
    (call $push (get_local $array) (i32.const 5))
    (call $logInt (call $length (get_local $array)))
    (call $logRef (get_local $array))
    (call $free (get_local $array)))
)

main.js:

// This is a unique integer which is used for the id of the JS objects
var jsRefId = 0;

// This is a dictionary which maps from id to JS object
var jsRefs = {};

// This creates a new id, puts the object into the dictionary, then returns the new id
function mallocJsRef(obj) {
    var id = jsRefId;
    // Because we always increment the id, it's guaranteed to be unique
    // (until it reaches 4294967295)
    ++jsRefId;
    jsRefs[id] = obj;
    return id;
}

// This looks up the JS object based upon its id
function lookupJsRef(id) {
    return jsRefs[id];
}

// This cleans up the memory for the JS object (by allowing it to be garbage collected)
function freeJsRef(id) {
    delete jsRefs[id];
}

var imports = {
    env: {
        makeNewArray: function () {
            return mallocJsRef([]);
        },
        push: function (id, value) {
            lookupJsRef(id).push(value);
        },
        length: function (id) {
            return lookupJsRef(id).length;
        },
        logInt: function (value) {
            console.log(value);
        },
        logRef: function (id) {
            console.log(lookupJsRef(id));
        },
        free: function (id) {
            freeJsRef(id);
        }
    }
};

WebAssembly.instantiateStreaming(fetch('../out/main.wasm'), imports).then(function (result) {
    result.instance.exports.main();
}).catch(function (e) {
    console.error(e);
});

The wasm code basically does the equivalent of this JS code:

var array = [];
array.push(1);
array.push(1);
array.push(2);
array.push(3);
array.push(5);
console.log(array.length);
console.log(array);

You can run the main.wat and main.js files in the WebAssembly Studio (create an "Empty Wat Project", copy-paste the code in, use Ctrl+S to save it, then click "Build & Run"). You should see this output in the bottom panel:

5
1,1,2,3,5

mallocJsRef, lookupJsRef, and freeJsRef are general purpose and can be used with anything, so they only need to be defined once (they could even be put into a library).

So the only boilerplate is that you have to create small shims (such as imports.env.makeNewArray, imports.env.push, etc.) for the various objects/functions/methods that you wish to use.

Also, the performance should be quite good: calls between JS and wasm are very fast (thanks to recent browser optimizations), and it's just passing an integer back and forth between JS and wasm (no copying of data!)

So the only performance cost is creating a new id (which is very fast), inserting the object into the dictionary, looking up the id in the dictionary, and deleting the id from the dictionary.

This should generally be fast enough. But if you need more performance, there are various optimizations you can try: rather than using a hash table for the dictionary, you can instead use an array (the indexes of the array are the same as the id). Or you could even create your own custom heap implementation in JS.

pchickey commented 5 years ago

As this proposal has evolved, details have changed from the response above, but the approach ought to work with the correct adapter expressions. I'll close this issue as resolved, but feel free to re-open if it requires more discussion.