dtolnay / watt

Runtime for executing procedural macros as WebAssembly
Apache License 2.0
1.29k stars 28 forks source link

Fix the WATT_JIT feature #48

Closed alexcrichton closed 2 years ago

alexcrichton commented 2 years ago

I was curious to see the impact of Wasmtime's recent development since I last added the WATT_JIT env var feature to watt a few years ago since quite a lot has changed about Wasmtime in the meantime. The changes in this PR account for some ABI changes which have happened in the C API which doesn't account for anything major.

Taking my old benchmark of #[derive(Serialize)] on struct S(f32, ... /* 1000 times */) the timings I get for the latest version of serde_derive are:

native watt watt (cached)
debug 156ms 280ms 125ms
release 70ms 257ms 100ms

Using instead #[derive(Serialize)] struct S(f32) the timings I get are:

native watt watt (cached)
debug 1ms 241ms 41ms
release 387us 205ms 46ms

So for large inputs jit-compiled WebAssembly can be faster than the native serde_derive when serde is itself compiled in debug mode. Note that this is almost always the default nowadays since cargo build --release will currently build build-dependencies with no optimizations. Only through explicit profile configuration can serde_derive be built in optimized mode (as I did to collect the above numbers).

The watt (cached) column is where I enabled Wasmtime's global compilation cache to avoid recompiling the module every time the proc-macro is loaded which is why the timings are much lower. The difference between watt and watt (cached) is the compile time of the module itself. The 40ms or so in watt (cached) is almost entirely overhead of loading the module from cache which involves decompressing the module from disk and additionally sloshing bytes around. More efficient storage mediums exist for Wasmtime modules which means that it would actually be pretty easy to shave off a good chunk of time from that. Additionally Wasmtime has a custom C API which significantly differs from the one used in this repository which would also be significantly faster for calling into the host from wasm. Of the current ~3ms runtime in wasm itself that could probably be reduced further with more optimized calls.

Overall this seems like pretty good progress made on Wasmtime in the interim since all my initial work in #2. In any case I wanted to post this to get the WATT_JIT feature at least working again since otherwise it's segfaulting right now, and perhaps in the future if necessary more perf work can be done!