I was curious to see the impact of Wasmtime's recent development since I
last added the WATT_JIT env var feature to watt a few years ago
since quite a lot has changed about Wasmtime in the meantime. The
changes in this PR account for some ABI changes which have happened in
the C API which doesn't account for anything major.
Taking my old benchmark of #[derive(Serialize)] on
struct S(f32, ... /* 1000 times */) the timings I get for the latest
version of serde_derive are:
native
watt
watt (cached)
debug
156ms
280ms
125ms
release
70ms
257ms
100ms
Using instead #[derive(Serialize)] struct S(f32) the timings I get are:
native
watt
watt (cached)
debug
1ms
241ms
41ms
release
387us
205ms
46ms
So for large inputs jit-compiled WebAssembly can be faster than the
native serde_derive when serde is itself compiled in debug mode. Note
that this is almost always the default nowadays since cargo build --release will currently build build-dependencies with no
optimizations. Only through explicit profile configuration can
serde_derive be built in optimized mode (as I did to collect the
above numbers).
The watt (cached) column is where I enabled Wasmtime's global
compilation cache to avoid recompiling the module every time the
proc-macro is loaded which is why the timings are much lower. The
difference between watt and watt (cached) is the compile time of the
module itself. The 40ms or so in watt (cached) is almost entirely
overhead of loading the module from cache which involves decompressing
the module from disk and additionally sloshing bytes around. More
efficient storage mediums exist for Wasmtime modules which means that it
would actually be pretty easy to shave off a good chunk of time from
that. Additionally Wasmtime has a custom C API which significantly
differs from the one used in this repository which would also be
significantly faster for calling into the host from wasm. Of the current
~3ms runtime in wasm itself that could probably be reduced further with
more optimized calls.
Overall this seems like pretty good progress made on Wasmtime in the
interim since all my initial work in #2. In any case I wanted to post
this to get the WATT_JIT feature at least working again since
otherwise it's segfaulting right now, and perhaps in the future if
necessary more perf work can be done!
I was curious to see the impact of Wasmtime's recent development since I last added the
WATT_JIT
env var feature towatt
a few years ago since quite a lot has changed about Wasmtime in the meantime. The changes in this PR account for some ABI changes which have happened in the C API which doesn't account for anything major.Taking my old benchmark of
#[derive(Serialize)]
onstruct S(f32, ... /* 1000 times */)
the timings I get for the latest version ofserde_derive
are:Using instead
#[derive(Serialize)] struct S(f32)
the timings I get are:So for large inputs jit-compiled WebAssembly can be faster than the native
serde_derive
when serde is itself compiled in debug mode. Note that this is almost always the default nowadays sincecargo build --release
will currently build build-dependencies with no optimizations. Only through explicit profile configuration canserde_derive
be built in optimized mode (as I did to collect the above numbers).The
watt (cached)
column is where I enabled Wasmtime's global compilation cache to avoid recompiling the module every time the proc-macro is loaded which is why the timings are much lower. The difference betweenwatt
andwatt (cached)
is the compile time of the module itself. The 40ms or so inwatt (cached)
is almost entirely overhead of loading the module from cache which involves decompressing the module from disk and additionally sloshing bytes around. More efficient storage mediums exist for Wasmtime modules which means that it would actually be pretty easy to shave off a good chunk of time from that. Additionally Wasmtime has a custom C API which significantly differs from the one used in this repository which would also be significantly faster for calling into the host from wasm. Of the current ~3ms runtime in wasm itself that could probably be reduced further with more optimized calls.Overall this seems like pretty good progress made on Wasmtime in the interim since all my initial work in #2. In any case I wanted to post this to get the
WATT_JIT
feature at least working again since otherwise it's segfaulting right now, and perhaps in the future if necessary more perf work can be done!