Open jameysharp opened 1 year ago
Nice ideas! When I included #224 I had some (but definitely not all!) of the same thoughts but then I said to myself, "what are users likely to do?" I would assume they take the default features and the perf
feature is enabled by default? (Checking... yup, it is!). Another thought: ff we did expect WebAssembly users to use different features, where would we document this? (Or is there a Cargo way to set the default features differently per target?)
You're absolutely right that people are likely to use default features, and for a lot of people that will be fine. So I like your philosophy for the initial regex benchmark.
I'd just like to be able to offer well-justified advice for the folks who need to get every cycle they can out of state-machine style workloads, and regular expressions as finite automata are a particularly widespread and representative use case. So I'm hoping we end up with more benchmarks in this space. Ideally we'd have a good way to compare different ways that a wasm producer can choose to express the same state machine, but I don't actually know how we might do that with Sightglass.
"Where would we document that" is a good question too! We should probably have some place we can write down what control-flow patterns are expensive in Wasmtime under different configurations. (For example, when fuel or epoch interruptions are enabled, loop back-edges are more expensive; or something about br_table
.) Maybe in the Wasmtime book? I bet @cfallin might have Opinions.
I think there is a Cargo way to set per-target features, although many of the crate features I identified above are specific to what kinds of regexes you're evaluating rather than what target you're building for. Disabling Unicode and perf-literal
saves a lot on any target, if you don't actually need those features, so I feel it'd be reasonable to turn them off in this benchmark. RegexBuilder::unicode
changes what inputs the regex matches on, and I think these particular regexes are actually being interpreted incorrectly by evaluating them with Unicode character classes, so I think disabling that is reasonable in this benchmark too.
But I could imagine that perf-dfa
might have different performance characteristics on Wasmtime than on a native target. So that's a case where I'd like to be able to compare different wasm that computes the same result. And I suspect the output of something like libfsm or proc-macro-regex
will perform substantially differently than the regex
crate due to generating completely different control flow, so I'd really like to be able to compare those.
We have a regex benchmark merged in #224. But there are a lot of variations between regex implementations. Even the
regex
crate alone has several tunable parameters that I'd expect to have a significant impact on this benchmark.Some thoughts if anyone wants to dig into the behavior of this benchmark or add more regex benchmarks:
It looks to me like the regular expressions in #224 are all intended to match only ASCII text. If so,
RegexBuilder::new(pattern).unicode(false)
should produce much smaller automata for the email and URI patterns. (I don't expect any impact on the IP-address pattern.) That shouldn't affect how long we spend compiling the resulting wasm program, but I'd expect a runtime speedup.I'm guessing the reason this wasm is so large is mostly due to including all the Unicode character class tables. Disabling default features for the
regex
crate should make it quite a bit smaller. For a fair comparison we'd want to enable theperf
feature to get all the default performance optimizations back. Also the docs say thestd
feature is currently required in any case.I would also be interested to know what impact that
perf
feature has on both compile time and runtime for this benchmark. In particular,perf-dfa
should significantly change the control-flow patterns in the core matching loop, and that might hit better or worse codegen patterns in Cranelift.I don't think the
perf-literal
feature will actually get used on any of these benchmarks, since the longest literal string is "://", so disabling that should save some compile time without much impact on runtime.As a more extreme experiment it would be interesting to pre-compile the patterns and see if that hits different codegen paths. Using https://crates.io/crates/proc-macro-regex in its mode where the table size is restricted should produce wildly different input to Cranelift, because that's the only case where the state machine is expressed in control flow rather than table lookups.
An alternative way to pre-compile to control flow is to use libfsm to generate C, and compile that into a small benchmark driver.