Open keithw opened 2 years ago
Thanks Keith! Those all sounds like reasonable things (except maybe the last one :)
Oh, I didn't know about w2c2. Link for reference: https://github.com/turbolent/w2c2
Regarding splitting output into multiple C files. I agree that could be a good idea, but I think one file per function might be a little too much. I imagine the compilation time of each source file is very much linear in the number of lines (wasm2c output is fairly uncomplicated). Perhaps we could have some kind of splitting threshold such as: start a new file after N lines?
Great ideas!
Regarding fuzzing, I fuzzed wasm2c a while ago using the binaryen fuzzer,
The general idea is that fuzzer emits random valid wasm files, and instructions for how to run them in various modes. That linked code runs wasm2c with the proper shim to run it (emitted by --emit-wasm2c-wrapper
) and get output that it can then compare to running the wasm in other ways (like in a wasm VM normally). Then it just diffs the output and sees if any return values or loggings are not identical.
This found a few bugs back then (edit: all of which have long been fixed), but I haven't kept it up to date recently. That would be great to do though.
There are also existing fuzzers in for parts wabt at least (e.g. parsers and validations). See https://github.com/google/oss-fuzz/tree/master/projects/wabt
Regarding splitting output into multiple C files. I agree that could be a good idea, but I think one file per function might be a little too much. I imagine the compilation time of each source file is very much linear in the number of lines (wasm2c output is fairly uncomplicated). Perhaps we could have some kind of splitting threshold such as: start a new file after N lines?
Agreed this would be better. I'm trying to think of a good way to do the partition that lets most files stay unchanged when only some functions are inserted/removed/modified. (To allow a memoized build to use its cache for 99% of the files.) You wouldn't want the act of adding one function to end up repacking all the .c files and therefore needing to recompile every one...
Regarding splitting output into multiple C files. I agree that could be a good idea, but I think one file per function might be a little too much. I imagine the compilation time of each source file is very much linear in the number of lines (wasm2c output is fairly uncomplicated). Perhaps we could have some kind of splitting threshold such as: start a new file after N lines?
Agreed this would be better. I'm trying to think of a good way to do the partition that lets most files stay unchanged when only some functions are inserted/removed/modified. (To allow a memoized build to use its cache for 99% of the files.) You wouldn't want the act of adding one function to end up repacking all the .c files and therefore needing to recompile every one...
Would a good-enough solution would be to just pack them alphabetically into N buckets (ignoring size)?
Then if you change any one function only that one file would change, adding or removing a function would effect N / 2 files. One downside is that several large function could end up in the same bucket.. but its seems like a reasonable first step. We could make it an option with N == -1 meaning one file per bucket.. so folks could experiment.
+1 This is a great list!
- start work on wasm2rust. Not totally serious, but it would be cool if this existed
It exists (and we used it for some researchy things)! https://github.com/secure-foundations/rWasm
There is also wasm-to-rust though it seems inactive now.
I've been thinking that a higher-level target might also make sense as wasm itself goes in that direction, specifically regarding GC. Wasm to Go/Kotlin/C# etc. could use native objects in the host GC which could have several benefits.
There is also wasm-to-rust though it seems inactive now.
I've been thinking that a higher-level target might also make sense as wasm itself goes in that direction, specifically regarding GC. Wasm to Go/Kotlin/C# etc. could use native objects in the host GC which could have several benefits.
And of course the existing wasm2js could use native JS objects (with fixed/frozen prototypes) https://github.com/WebAssembly/binaryen/blob/main/src/tools/wasm2js.cpp
Currently, the WASM page size is fixed at 64KiB, which is rather expensive in some scenarios.
WebAssembly WG proposed a new feature to handle it nicely: https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md
Please consider implementing this for wasm2c
. This would allow running really tiny wasm modules :raised_hands:
The implementation for wasm3 was really simple
Now that wasm2c has almost caught up to the current Wasm spec, maybe it's a good time to brainstorm about the roadmap from here and see what everything else thinks is useful/worth prioritizing. Here are some possible items and thoughts to get the discussion going: