WebAssembly / WASI

WebAssembly System Interface
Other
4.84k stars 251 forks source link

WASI proposal for dynamic typing (WASI-dyntype) #552

Open xujuntwt95329 opened 1 year ago

xujuntwt95329 commented 1 year ago

This issue is to introduce the idea of proposing WASI-dyntype APIs.

Introduction

The type system of WebAssembly is entirely static in nature, precluding the direct compilation of dynamic programming languages into WebAssembly. Currently, the primary method of supporting dynamic languages involves compiling the language runtime into WebAssembly. This approach involves running virtual machines of individual languages within the WebAssembly environment (VM inside VM), which results in notable performance overhead and significant memory consumption.

Modern dynamic languages tend to incorporate more type annotations (such as TypeScript or type hints in Python). If these type annotations could be leveraged for static compilation, they would contribute to enhancing application performance. However, the aforementioned VM-inside-VM approach falls short of achieving this goal.

The objective of this proposal is to furnish a standardized set of public APIs designed for handling dynamically-typed objects. These APIs would facilitate the management and access of dynamic objects hosted by the external environment, thereby affording opportunities for statically compiling other objects with sufficient type information.

Strategy

image

Goal

Non-Goal

API walk-through

object creation and property accessing

let obj : any = {};
obj.f1 = 1;
obj.f2 = 'Hello';

This can be generated as:

obj_ref = new_object();
set_property(obj_ref, "f1", new_number(1));
set_property(obj_ref, "f2", new_string("Hello"));

runtime type checking

let obj : any = 100;
let num : number = obj as number;

This can be generated as:

obj_ref = new_number(100);
if (is_number(obj_ref)) {
    num = to_number(obj_ref);
}

subtyping

a instanceof b

This can be generated as:

instance_of(a_ref, b_ref);

exception

let a: any = ...;
throw a;

This can be generated as:

throw(a_ref);

re-use existing built-in objects and methdos

let m: any = new Map();
m.set('a', 1);

This can be generated as:

m_ref = new_object_with_class("Map");
invoke(m_ref, "set", "a", 1);
                     ^ va-args

Detailed design discussion

Given this code as an example:

class A {
    x: number = 1;
}
let num: number = 100;
let a_inst = new A();
let obj : any = {};

This can be generated as such WebAssembly opcodes:

(local $num f64)
(local $a_inst ref $A)
(local $obj ref extern)
(local.set $num (f64.const 100))
(local.set $a_inst (struct.new_fixed $A (f64.const 100)))
(local.set $obj (call create_object))

Accessing dynamic object $a will be slow because it involves invoking host APIs, but accessing the $num and $a_inst would be much faster than the VM-inside-VM approach.

lukewagner commented 1 year ago

I think this sort of feature belongs in Core WebAssembly, not WASI, since what you're describing deeply influences how the code is compiled and executed (including what types and instructions are available in wasm function bodies). The wasm-gc proposal just reached Stage 4 (coincidentally, 10 minutes ago 🎉 ) and I think forms the basis of what you want (engine-provided GC). I expect wasm-gc is more statically-typed and low-level than you're looking for, but having seen a great deal of discussion of this over the years, I believe baking in high-level dynamic language semantics is a path fraught with peril as there are so many mutually-conflicting requirements. Instead, I think an interesting question to ask is: given now-standard wasm-gc, are there any other GC primitives that would enable dynamic languages to be compiled more-efficiently on top of wasm-gc than is possible today? To pursue this question, I'd suggest scanning through open issues in the wasm-gc and design repos for existing discussions on this topic and filing new issues for remaining questions/ideas.

xujuntwt95329 commented 1 year ago

@lukewagner Thanks for the reply and suggestions, and it's really exciting to hear that wasm-gc reaches Stage 4 🎉

I think this sort of feature belongs in Core WebAssembly, not WASI, since what you're describing deeply influences how the code is compiled and executed (including what types and instructions are available in wasm function bodies).

These APIs did influence how the code is compiled, but seems it's difficult to enter Core WebAssembly since supporting high level semantics such as dynamic typing is not a goal of WebAssembly.

We are actually developing a compiler which can compile TypeScript to WebAssembly, we use wasm-gc to represent the statically typed code (e.g. number, boolean, class), but it's not possible to represent the dynamic types such as any. We learned the principle and goals of wasm-gc, and also tried to propose some opcode, but we realize that wasm-gc doesn't aim to provide high level or dynamic semantics.

So that's why we are here: we abstract the APIs to access dynamic objects managed from external environment, it works as an escape hatch for dynamic typing, so we can still use wasm-gc to represent the statically typed objects in TypeScript and use these APIs to access dynamic objects, this avoids compiling a whole language runtime into WebAssembly.

image

Previously most of the approaches to support dynamic language on WebAssembly is VM-inside-VM (e.g. compiling QuickJS or CPython to WebAssembly), they work on linear memory and can't benefit from wasm-gc. Since TypeScript have both static and dynamic part, the proposed APIs gives us the opportunity to leverage wasm-gc for compiling static part of TypeScript to wasm-gc, as shown in the picture above.

sbc100 commented 1 year ago

I think I agree with Luke (assuming I understand his suggestion) that the best way to achieve this kind of thing would be look at extensions to wasm-gc in order to support the kind of dynamic behavior your need.

Where did you read that "supporting high level semantics such as dynamic typing is not a goal of WebAssembly"? My understanding is that the goal of WebAssembly is to support all types of languages as efficiently as possible, but that we just started out targeting certain types of languages with the initial version.

xujuntwt95329 commented 1 year ago

I think I agree with Luke (assuming I understand his suggestion) that the best way to achieve this kind of thing would be look at extensions to wasm-gc in order to support the kind of dynamic behavior your need.

Where did you read that "supporting high level semantics such as dynamic typing is not a goal of WebAssembly"? My understanding is that the goal of WebAssembly is to support all types of languages as efficiently as possible, but that we just started out targeting certain types of languages with the initial version.

Well, we think it would be better if this can be some wasm-gc extension, but according to some previous discussion, currently wasm-gc opcodes are "as low level as possible", so bring such dynamic typing into wasm-gc seems not compatible to the principle.

Where did you read that "supporting high level semantics such as dynamic typing is not a goal of WebAssembly"?

My previous description may be not very accurate, my understanding is that WebAssembly will not provide opcode to support dynamic type directly, currently dynamically typed languages already works well through compiling their runtime into WebAssembly, but this will introduce some footprint overhead.

These APIs separate the processing of dynamic typing, so we can compile the static part to wasm-gc directly, without another garbage collector inside wasm module.

sbc100 commented 1 year ago

Well, we think it would be better if this can be some wasm-gc extension, but according to some previous discussion, currently wasm-gc opcodes are "as low level as possible", so bring such dynamic typing into wasm-gc seems not compatible to the principle.

Ah I see. My interpretation of that would be that in order for an addition of wasm-gc in support of dynamic languages to gain traction we would need to show that it would be much more efficient that building the same dynamic features on top of wasm-gc primitives.

Presumably it is technically possible to build a dynamic object model top of the wasm-gc object model? (e.g. represent the dynamic fields in some kind of map data structure?).

kripken commented 1 year ago

@sbc100

Presumably it is technically possible to build a dynamic object model top of the wasm-gc object model? (e.g. represent the dynamic fields in some kind of map data structure?).

I believe that's why @xujuntwt95329 proposed a new WasmGC instruction to allow dynamic field access as in the link in the previous discussion. It's hard to do without a new instruction, really (an array can't mix different types, and a struct has fixed access only, and there are no maps).

@xujuntwt95329 I sympathize with your position, since basically you went to the WasmGC people and got an unenthusiastic response, and so you came here but you got the same thing basically.

With that said, I do agree with the concerns mentioned both here and there, even though I am very much in favor of good support for dynamic languages in wasm. My own position is still what I wrote at the end of one my comments there,

you can store data in linear memory and [..] use WasmGC objects only for references.

That is, data in linear memory will easily allow dynamic field access using the normal tricks, and reference access using a WasmGC array also allows dynamic access. You will have overhead (separate storage for data and references, and casts from the WasmGC array) but it might still be fast enough for dynamic objects (especially since your compiler doesn't implement all objects dynamically). I'd recommend experimenting with that first, as the other options appear to be more radical and would require some changing of minds.

(This will need weak reference support, as mentioned before, but that is at least already planned, and can be polyfilled today on the Web using JS.)

programmerjake commented 1 year ago

i think you can probably end up with a pretty efficient JS implementation using wasm GC by using the hidden classes and inline caching technique that V8 uses -- that doesn't usually use an arbitrary hashmap to represent objects.

xujuntwt95329 commented 1 year ago

@kripken Thanks for your understanding 😀. The opcode we previously proposed to wasm-gc is much more low-level than these APIs, I believe that if we propose these dynamic features to wasm-gc, there will be a more direct reject.

I personally understand all the concerns from both side, because there are already solutions to support dynamic typed language (as suggested by @sbc100 and @programmerjake, implement the object management based on WasmGC, or current solution: compile whole VM into WebAssembly), seems there isn't a strong necessity to introduce a new concept at the standard level.

However, we are trying to, at least provide the opportunity to, avoid a runtime inside wasm module because there are many resource constraint devices, they may have very limited RAM and flash, remove the runtime from wasm module may allow these devices to install more applications. So that's why we want to use these APIs to separate the processing of dynamic types, it will have these benefits:

This gives us the flexibility to utilize different implementations on different environment, while don't need to introduce too many new concepts into WebAssembly.

@kripken I like your idea that you can store data in linear memory and [..] use WasmGC objects only for references, actually I think this can work together with these proposed APIs: we store data in linear memory, use an externref to reference these memory space, and use the proposed APIs to access them.

image