WebAssembly / interface-types

Other
641 stars 57 forks source link

Performance concerns about UTF-8 strings #38

Open Pauan opened 5 years ago

Pauan commented 5 years ago

I've written some Rust web apps which are compiled to WebAssembly and run in the browser. I'm using wasm-bindgen for this.

Generally the code runs really fast (since both Rust and WebAssembly are quite fast), but there's one area in particular which is an order of magnitude slower: strings.

Rust uses UTF-8 strings, and so whenever I send a string to/from JS it has to encode/decode it to UTF-16. This is done using the browser's TextEncoder.encodeInto and TextDecoder.decode APIs.

This is as optimal as we can currently get in terms of speed, but it's not good enough, because encoding/decoding is way slower than everything else:

DevTools Screenshot

The total script execution time is 2413ms. Out of that, 494ms is from the browser's decoding, and a further 434ms from garbage collecting the JS strings. So that means just passing strings from Rust to JS is taking up ~40% of the total execution time!

This encoding/decoding is so slow that it means that a pure JavaScript app is faster than a Rust app (the JS app only takes 1236ms for all scripting + gc)!

These are not big strings, they are all small strings like "row", "col-sm-6", "div", etc. The biggest string is "glyphicon-remove".

Unfortunately, I cannot avoid this string passing, because I'm calling native web APIs (like document.createElement, Node.prototype.textContent, etc.), so the usual solution of "move everything into Rust" doesn't work.

This is clearly a known concern for WebIDL bindings, which is why the proposal includes the utf8‑str and utf8‑cstr types.

However, I'm concerned that WebIDL won't actually fix this performance problem, because the browsers (at least Firefox) internally use UTF-16, so they'll still have to do the encoding/decoding, so the performance will be the same.

I know this is an implementation concern and not a spec concern, but is there any practical plan for how the browsers can implement the utf8-str and utf8-cstr types in a fast zero-copy O(1) way without needing to change their engines to internally use UTF-8?

fgmccabe commented 5 years ago

Thanks for the thoughtful note. You should realize that so long as rust is based on linear memory it’s unlikely that there will be a zero cost mapping. Also asking browsers to support utf 8 seems a big ask. Francis

On Mon, Jun 17, 2019 at 6:44 AM Pauan notifications@github.com wrote:

I've written some Rust web apps which are compiled to WebAssembly and run in the browser. I'm using wasm-bindgen for this.

Generally the code runs really fast (since both Rust and WebAssembly are quite fast), but there's one area in particular which is multiple orders of magnitude slower: strings.

Rust uses UTF-8 strings, and so whenever I send a string to/from JS it has to encode/decode it to UTF-16. This is done using the browser's TextEncoder.encodeInto and TextDecoder.decode APIs.

This is as optimal as we can currently get in terms of speed, but it's not good enough, because encoding/decoding is way slower than everything else:

[image: DevTools Screenshot] https://camo.githubusercontent.com/8b044dbae147bb74312678374faea1668451b6df/68747470733a2f2f692e696d6775722e636f6d2f613741486f46422e706e67

The total script execution time is 2413ms. Out of that, 494ms is from the browser's decoding, and a further 434ms from garbage collecting the JS strings. So that means just passing strings from Rust to JS is taking up ~40% of the total execution time!

This encoding/decoding is so slow that it means that a pure JavaScript app is faster than a Rust app (the JS app only takes 1236ms for all scripting + gc)!

Unfortunately, I cannot avoid this string passing, because I'm calling native web APIs (like document.createElement, Node.prototype.textContent, etc.), so the usual solution of "move everything into Rust" doesn't work.

This is clearly a known concern for WebIDL bindings, which is why the proposal includes the utf8‑str and utf8‑cstr types.

However, I'm concerned that WebIDL won't actually fix this performance problem, because the browsers (at least Firefox) internally use UTF-16, so they'll still have to do the encoding/decoding, so the performance will be the same.

I know this is an implementation concern and not a spec concern, but is there any practical plan for how the browsers can implement the utf8-str and utf8-cstr types in a fast zero-copy O(1) way without needing to change their engines to internally use UTF-8?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/38?email_source=notifications&email_token=AAQAXUFWULJX3AODUPV2IHLP26INBA5CNFSM4HYWY5JKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZ4M5HA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUDKXAIWVOTSTXYFA53P26INBANCNFSM4HYWY5JA .

-- Francis McCabe SWE

annevk commented 5 years ago

I know we are looking at using UTF-8 more internally and also want to avoid more string conversion such as those you mention. I'm not sure how that's currently prioritized relative to other work.

cc @hsivonen @lukewagner

fgmccabe commented 5 years ago

I do have an additional thought wrt strings. If the strings the rust app is sending to the API are 'enum' values, then the web/host/wasm-IDL bindings story will have a good answer for eliminating string copy. If the strings are 'data' strings then, because linear memory cannot be trusted, a copy is almost inevitable. (There is a scenario where copy might be avoidable but it is pretty 'fragile')

On Tue, Jun 18, 2019 at 4:28 AM Anne van Kesteren notifications@github.com wrote:

I know we are looking at using UTF-8 more internally and also want to avoid more string conversion such as those you mention. I'm not sure how that's currently prioritized relative to other work.

cc @hsivonen https://github.com/hsivonen @lukewagner https://github.com/lukewagner

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/38?email_source=notifications&email_token=AAQAXUECELINTDKLOGZMSVTP3DBGDA5CNFSM4HYWY5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX6CJBI#issuecomment-503063685, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUAGCYATM5EGYPFQSJTP3DBGDANCNFSM4HYWY5JA .

-- Francis McCabe SWE

Pauan commented 5 years ago

(P.S. It seems in the recent slides that utf8-str was renamed to utf8-mem-str, but of course all the same problems still apply)

kmiller68 commented 5 years ago

For what it's worth, in the WebKit world, many of the strings passed into WebIDL will end up being "Atomed" (canonicalized so comparison is just a pointer check). Thus, even if we had a utf8 string implementation, there would still be a copy problem.

As far as implementing utf8 support in WebKit, much like @annevk said, there's definitely interest in having a utf8 implementation. Unfortunately, my guess is implementing it would be a many year process. IIUC, implementing Latin1 strings was a two year process. Since that was ~8 years ago, I would imagine a utf8 project would not be easier.

Pauan commented 5 years ago

@kmiller68 Yeah, I don't expect a solution anytime soon. I just wanted to make sure that this problem is known about and is being worked on (even if slowly).

I was hoping there would be a clever way to avoid needing to re-implement support for UTF-8, but I guess not.

Pauan commented 5 years ago

I had the idea to try caching the strings, so if the same string is used multiple times it only needs to encode it once.

I implemented a simple cache in Rust, which only caches a small number of strings which are known to never change. The results are dramatic:

DevTools 005

Now the scripting time is 1680ms (733ms faster), and decoding + GC now only takes 429ms (499ms faster). That means encoding now takes 46% the amount of time it used to!

I think this technique could be used in the browser engine: keep an LRU cache mapping from UTF-8 strings to UTF-16 strings.

When converting from the utf8-mem-str type, it would look up the string in the cache, and if it's found it can avoid the encoding cost.

This does add a bit of performance unpredictability, and the caches will need to be carefully tuned for maximum performance, but it seems like a relatively easy way to get huge performance gains without needing to refactor the engine to use UTF-8.

Horcrux7 commented 5 years ago

Why not convert this strings to UTF-16 at compile time? On creating the wasm file.

Pauan commented 5 years ago

@Horcrux7 When the Rust program wants to call a native API, it must first send the UTF-8 string from Rust to JS, and then convert that UTF-8 string into a JS string (which is UTF-16).

This encoding from UTF-8 to UTF-16 cannot happen in Rust, because the end result needs to be a JS string, not a Rust string.

Wasm cannot (currently) store or create JS strings, and so the JS string must be created on the JS side. So any sort of compile-time encoding would have to happen on the JS side, not the wasm side.

lukewagner commented 5 years ago

It's also the case in Gecko that we'll have to copy strings in Gecko and it would take a lot of work to achieve a truly zero-copy optimization. However, this would still be superior to the current situation (without bindings) in that the copy memory could be eagerly release (no GC), allocated efficiently (e.g., from a stack), and without a separate wasm->(JS->)API call to perform the decode step. So there still be a significant perf win even without the zero-copy impl.

FWIW, everything in the explainer and recent presentation slides should be considered just a strawman to help give a general sense of how things would work; the precise set of binding operators and their semantics isn't yet fleshed out.

hsivonen commented 4 years ago

@Pauan

These are not big strings, they are all small strings like "row", "col-sm-6", "div", etc. The biggest string is "glyphicon-remove".

Are they all ASCII? Do they flow equally in both directions (Wasm to JS and JS to Wasm)? Do I understand correctly that in the case JS to Wasm case, these are literals in the JS source and not strings that you've concatenated from pieces at run time?

For small ASCII strings like this, the JS to Wasm path in wasm-bindgen currently bypasses TextEncoder.encodeInto. Even though TextEncoder.encodeInto was optimized for Firefox 71 to avoid copies and to use SIMD, the binding overhead appears to be so large that it continues to make sense for wasm-bindgen to bypass TextEncoder.encodeInto for ASCII at least when the string are short.

I think the main take-away from the TextEncoder.encodeInto optimization not outperforming wasm-bindgen's manual JS-only WebIDL bypass this is that attributing the performance concern to UTF-8 may be incorrect and the issue is more likely the cost of the WebIDL layer for TextEncoder.encodeInto and TextDecoder.decode. Since this issue is filed on interface-types, the cost of WebIDL of the TextEncoder.encodeInto and TextDecoder.decode shouldn't be held against interface-types.

(In the Wasm to JS direction, TextDecoder.decode is used by wasm-bindgen without a bypass, so the binding overhead is there even for short ASCII.)

Pauan commented 4 years ago

@hsivonen Are they all ASCII?

Yes.

Do they flow equally in both directions (Wasm to JS and JS to Wasm)?

No, they only flow from Wasm to JS (which means UTF-8 -> UTF-16 encoding using TextDecoder.decode).

Therefore the ASCII optimizations and encodeInto do not factor into it.

Also, as shown in the screenshots I've posted, the primary performance bottleneck is from the browser's internal encoding function (and GCing the strings), not any of the wasm-bindgen glue (which is negligible).

So there's only two ways to fix this problem: 1) the browsers somehow dramatically improve their internal encoding algorithms, or 2) they avoid encoding entirely (which is what this issue is about).

At the time this issue was posted, interface types specified the encoding of the string, so this issue was filed in the correct place.

hsivonen commented 4 years ago

Also, as shown in the screenshots I've posted, the primary performance bottleneck is from the browser's internal encoding function (and GCing the strings), not any of the wasm-bindgen glue (which is negligible).

The screenshot appears to show Chrome. What's the performance like in Firefox Nightly?

So there's only two ways to fix this problem: 1) the browsers somehow dramatically improve their internal encoding algorithms, or 2) they avoid encoding entirely (which is what this issue is about).

Is a build from https://treeherder.mozilla.org/#/jobs?repo=try&revision=81247f866a936a332cba7d5b764537c36a7d8494 better or worse than Firefox Nightly?

(On the right, find the "shippable" row for your platform, click the green "+3" to expand it (no need to expand in the Mac case), click the green B to select the build task, at the bottom, click the "Job details" tab and find target.dmg for Mac, target.zip for Windows, or target.tar.bz2 for Linux--all x86_64. You can browse the changesets on the left to convince yourself that the difference between these builds and the Nightly baseline is legitimate.)

For the cases you describe (Wasm to JS, ASCII, longest string length 16), the Treeherder builds linked above should avoid actual encoding conversion and ever buffer malloc in TextDecoder.decode. Your ASCII bytes should get copied as ASCII inline bytes into SpiderMonkey string objects themselves (as opposed to getting inflated to UTF-16 or getting placed into a buffer the the string object points to). However, 1) there's still an inflating conversion to UTF-16 once you pass them to a DOM API (do you do anything other with the strings obtained from TextDecoder.decode than pass them directly to a DOM API?) and 2) you still pay the binding overhead for the cali into TextDecoder.decode.


Your initial comment mentions Node.prototype.textContent but the example strings you give don't look like stuff one would put in text nodes. Are you actually using Node.prototype.textContent with ASCII strings the longest of which has the length 16? (The reason why I'm asking is that DOM text nodes in Gecko have potentially non-UTF-16 storage, but their interface doesn't yet let SpiderMonkey make use of the non-UTF-16 case.)

Pauan commented 4 years ago

The screenshot appears to show Chrome. What's the performance like in Firefox Nightly?

Yes the screenshots were in Chrome (though Firefox had similar issues at the time).

For consistency, here's the latest Chrome (Version 77.0.3865.120 (Official Build) (64-bit)) screenshot:

UTF8 Chrome Uncached

And Firefox Developer Edition (71.0b2 (64-bit)):

UTF8 Firefox Uncached

And Firefox Nightly (72.0a1 (2019-10-27) (64-bit)):

UTF8 Nightly Uncached

And the Treeherder build you linked to:

UTF8 Treeherder Uncached

The first thing that I see is that Firefox Developer Edition includes a lot more information: Nightly and Treeherder don't show performance of individual functions like TextDecoder.decode anymore. I really don't like that.

Secondly, Firefox (in general) has much better decode performance than Chrome. It's no longer the massive bottleneck that it was. That wasn't true at the time I filed this issue, so it seems TextDecoder.decode has improved a lot since then, good job!

Thirdly, Nightly and Treeherder don't show the performance of individual functions, however the GC is clearly better in Nightly.

there's still an inflating conversion to UTF-16 once you pass them to a DOM API (do you do anything other with the strings obtained from TextDecoder.decode than pass them directly to a DOM API?)

No, it doesn't do anything with the strings, it just calls TextDecoder.decode and then immediately passes it to the DOM APIs.

Your initial comment mentions Node.prototype.textContent but the example strings you give don't look like stuff one would put in text nodes.

Those specific example strings are added to the DOM's classList with foo.classList.add.

Are you actually using Node.prototype.textContent with ASCII strings the longest of which has the length 16?

Yes, they're ASCII, though in the case of textContent the biggest string is 28 characters long.

hsivonen commented 4 years ago

@Pauan Thank you.

The first thing that I see is that Firefox Developer Edition includes a lot more information: Nightly and Treeherder don't show performance of individual functions like TextDecoder.decode anymore. I really don't like that.

I don't know if this is a change in the profiler or if Nightly and Treeherder builds just don't get their symbols distributed in the same way as the release and Dev Edition channels. (I suspect the latter.)

Secondly, Firefox (in general) has much better decode performance than Chrome. It's no longer the massive bottleneck that it was. That wasn't true at the time I filed this issue, so it seems TextDecoder.decode has improved a lot since then, good job!

I don't recall TextDecode.decode internals having changed since June. The WebIDL layer might have. However, it's good to hear that it's faster than Chrome. That's intentional. :-)

Thirdly, Nightly and Treeherder don't show the performance of individual functions, however the GC is clearly better in Nightly.

Compared to Nightly, the Treeherder builds remove one JavaScript string object cache, so the Treeherder builds are expected to generate more string garbage if you re-decode the same string without having decoded four different strings in between.

How tight is your measurement loop? That is, does your workload re-decode the same string without decoding four different strings in between?

No, it doesn't do anything with the strings, it just calls TextDecoder.decode and then immediately passes it to the DOM APIs.

Thanks. This kind of pattern doesn't avoid inflation to UTF-16 and just defers the inflation in the case of the Treeherder builds. However, for short strings (15 or shorter is one kind of short, 23 or shorter is another kind of short), the Treeherder builds avoid an extra inflate/deflate cycle and in the ASCII case avoid a malloc.

Yes, they're ASCII, though in the case of textContent the biggest string is 28 characters long.

Thanks. This could benefit from further optimization opportunities.

aminya commented 4 years ago

There is a similar issue with JSOX and JSOX-WASM. The wasm implementation that is written in C++ is slower than the JavaScript code because of the string conversion overhead. https://github.com/d3x0r/jsox-wasm/issues/1