WebAssembly / wabt

The WebAssembly Binary Toolkit
Apache License 2.0
6.86k stars 699 forks source link

wasm2c roadmap ideas #2019

Open keithw opened 2 years ago

keithw commented 2 years ago

Now that wasm2c has almost caught up to the current Wasm spec, maybe it's a good time to brainstorm about the roadmap from here and see what everything else thinks is useful/worth prioritizing. Here are some possible items and thoughts to get the discussion going:

sbc100 commented 2 years ago

Thanks Keith! Those all sounds like reasonable things (except maybe the last one :)

Oh, I didn't know about w2c2. Link for reference: https://github.com/turbolent/w2c2

Regarding splitting output into multiple C files. I agree that could be a good idea, but I think one file per function might be a little too much. I imagine the compilation time of each source file is very much linear in the number of lines (wasm2c output is fairly uncomplicated). Perhaps we could have some kind of splitting threshold such as: start a new file after N lines?

kripken commented 2 years ago

Great ideas!

Regarding fuzzing, I fuzzed wasm2c a while ago using the binaryen fuzzer,

https://github.com/WebAssembly/binaryen/blob/5449744d79ec996c7334681ac1b85e5461194dc8/scripts/fuzz_opt.py#L714-L756

The general idea is that fuzzer emits random valid wasm files, and instructions for how to run them in various modes. That linked code runs wasm2c with the proper shim to run it (emitted by --emit-wasm2c-wrapper) and get output that it can then compare to running the wasm in other ways (like in a wasm VM normally). Then it just diffs the output and sees if any return values or loggings are not identical.

This found a few bugs back then (edit: all of which have long been fixed), but I haven't kept it up to date recently. That would be great to do though.

sbc100 commented 2 years ago

There are also existing fuzzers in for parts wabt at least (e.g. parsers and validations). See https://github.com/google/oss-fuzz/tree/master/projects/wabt

keithw commented 2 years ago

Regarding splitting output into multiple C files. I agree that could be a good idea, but I think one file per function might be a little too much. I imagine the compilation time of each source file is very much linear in the number of lines (wasm2c output is fairly uncomplicated). Perhaps we could have some kind of splitting threshold such as: start a new file after N lines?

Agreed this would be better. I'm trying to think of a good way to do the partition that lets most files stay unchanged when only some functions are inserted/removed/modified. (To allow a memoized build to use its cache for 99% of the files.) You wouldn't want the act of adding one function to end up repacking all the .c files and therefore needing to recompile every one...

sbc100 commented 2 years ago

Regarding splitting output into multiple C files. I agree that could be a good idea, but I think one file per function might be a little too much. I imagine the compilation time of each source file is very much linear in the number of lines (wasm2c output is fairly uncomplicated). Perhaps we could have some kind of splitting threshold such as: start a new file after N lines?

Agreed this would be better. I'm trying to think of a good way to do the partition that lets most files stay unchanged when only some functions are inserted/removed/modified. (To allow a memoized build to use its cache for 99% of the files.) You wouldn't want the act of adding one function to end up repacking all the .c files and therefore needing to recompile every one...

Would a good-enough solution would be to just pack them alphabetically into N buckets (ignoring size)?

Then if you change any one function only that one file would change, adding or removing a function would effect N / 2 files. One downside is that several large function could end up in the same bucket.. but its seems like a reasonable first step. We could make it an option with N == -1 meaning one file per bucket.. so folks could experiment.

deian commented 1 year ago

+1 This is a great list!

  • start work on wasm2rust. Not totally serious, but it would be cool if this existed

It exists (and we used it for some researchy things)! https://github.com/secure-foundations/rWasm

kripken commented 1 year ago

There is also wasm-to-rust though it seems inactive now.

I've been thinking that a higher-level target might also make sense as wasm itself goes in that direction, specifically regarding GC. Wasm to Go/Kotlin/C# etc. could use native objects in the host GC which could have several benefits.

sbc100 commented 1 year ago

There is also wasm-to-rust though it seems inactive now.

I've been thinking that a higher-level target might also make sense as wasm itself goes in that direction, specifically regarding GC. Wasm to Go/Kotlin/C# etc. could use native objects in the host GC which could have several benefits.

And of course the existing wasm2js could use native JS objects (with fixed/frozen prototypes) https://github.com/WebAssembly/binaryen/blob/main/src/tools/wasm2js.cpp

vshymanskyy commented 2 months ago

Currently, the WASM page size is fixed at 64KiB, which is rather expensive in some scenarios.

WebAssembly WG proposed a new feature to handle it nicely: https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md

Please consider implementing this for wasm2c. This would allow running really tiny wasm modules :raised_hands:

The implementation for wasm3 was really simple