Move token_stream_parse implementation to Wasm

dtolnay / watt

Runtime for executing procedural macros as WebAssembly

Apache License 2.0

1.29k stars 28 forks source link

Move token_stream_parse implementation to Wasm #18

Closed dtolnay closed 5 years ago

dtolnay commented 5 years ago

As noted in https://github.com/dtolnay/watt/issues/2#issuecomment-548018670,

watt::sym::token_stream_parse is me being too lazy to implement a Rust syntax tokenizer in wasm (aka copy it from proc-macro2), but we could likely optimize that slightly by running that in wasm as well.

We'll need to import something like https://github.com/alexcrichton/proc-macro2/blob/1.0.6/src/strnom.rs and use that to implement token stream parsing rather than calling out to a host func.

It's possible we can simplify the implementation from proc-macro2 by omitting the parts that deal with parsing comments, as those won't appear in our use case.

alexcrichton commented 5 years ago

One thing I should note on this is that I was surprised that #[derive(Serialize)] called this function, but then I remembered this is part of quote!, which also lends itself well to reasoning that this may be relatively hot if it's called in quote! a lot. In that sense I think we could perhaps speed up but a mild amount, but as the linked post shows it's only 1-2ms for a massive macro.

mystor commented 5 years ago

Does that include time spent running watt::sym::token_stream_serialize to get the parsed data into wasm? Running parsing in wasm should eliminate a bunch of calls to these other methods, which might increase the impact a bit.

In addition, given the higher minimum rustc version for watt, it may be reasonable to use rustc_lexer as the lexer within wasm, probably based on the work in https://github.com/alexcrichton/proc-macro2/pull/202. I'm guessing it's been optimized a bit more than strnom.

alexcrichton commented 5 years ago

Oh right that's an excellent point! Yes the timing information of watt::sym::token_stream_parse doesn't include the timing of watt::sym::token_stream_serialize nor the time to deserialize inside of wasm itself. Those would all largely be avoided if we move this to wasm. While still likely to be modest, I think it'd be a clear win speed-wise.

mystor commented 5 years ago

@dtolnay benchmarked an impl of this in https://github.com/dtolnay/watt/pull/26#pullrequestreview-310856210, and found it had equivalent (or worse) performance, even on large inputs. I'm inclined to think that we should leave the token_stream_parse implementation implemented by the host runtime.