Closed eqrion closed 3 months ago
Sure, that could work. I find it hard to estimate whether it would be more efficient than using globals directly, but it would certainly have a higher limit on the number of strings it can support.
Speaking of which: the relevant limit for the status quo is 100K imports, not 1M globals.
I think requiring a bunch of JS code to run before the Wasm module can be instantiated is Not Great. That's the bullet that all imports-based approaches to string contents have to bite.
I think making table.get
(for immutable imported tables) a constant instruction is probably a hard requirement for making this approach viable. If a module doesn't need strings in constant initializers, it can simply use array.new_data
+ String.fromWtf8Array
(which engines can shortcut to an internal "string from data segment" operation, and which toolchains can support easily because everything is in the Wasm module).
Speaking of which: the relevant limit for the status quo is 100K imports, not 1M globals.
Ah yes, I felt like it was lower than 1M but couldn't find a reason why.
I think making
table.get
(for immutable imported tables) a constant instruction is probably a hard requirement for making this approach viable.
From some discussion with @rossberg, I realized this would require us to have immutable
(and probably also non-growable) tables, which would be substantial additions. I'm not sure if they're worth adding to the wasm language.
A different option to avoid any significant core wasm change would be to:
(global $strtab (ref (array (ref extern))))
array.get
for immutable arrays as a constant expressionWebAssembly.Array.fromIterable(heapType, iterable)
which could be used to take a JS array of strings and turn it into an immutable wasm arrayIf a module doesn't need strings in constant initializers, it can simply use
array.new_data
+String.fromWtf8Array
(which engines can shortcut to an internal "string from data segment" operation, and which toolchains can support easily because everything is in the Wasm module).
I'd really prefer to avoid adding an optimization like this to our baseline compiler (and I assume interpreter if we had one too). We'd need to add some way of tracking a deferred array.new_data
computation in our abstract evaluation stack, and that leaks pretty far in the compiler.
I'd really prefer to avoid adding an optimization like this to our baseline compiler
Sure, we don't have it in our baseline compiler either. It's a functionally unobservable and hence strictly optional internal optimization, appropriate mostly for optimizing compilers. The biggest issue with it is that recent toolchain evolution suggests that most or even all modules will need most or even all of their strings to be available in constant initializers, so there doesn't seem to be much problem left for this solution to solve.
I'd really prefer to avoid adding an optimization like this to our baseline compiler
Sure, we don't have it in our baseline compiler either. It's a functionally unobservable and hence strictly optional internal optimization, appropriate mostly for optimizing compilers. The biggest issue with it is that recent toolchain evolution suggests that most or even all modules will need most or even all of their strings to be available in constant initializers, so there doesn't seem to be much problem left for this solution to solve.
Just so I understand correctly, are you saying that your recent toolchain experience indicates we'll need something more than globals? If so, should I push on the immutable array idea or do you have another idea?
I'm saying that toolchains want to be able to use strings in globals. It's less about the character sequences themselves, it's more about things that transitively depend on them, such as java.lang.String
wrapper objects, and ultimately class metadata and whatnot that has properties with (wrapped) string values.
Having a constant string.const
instruction makes that easy and efficient. Globals that refer to strings can be initialized with initializers, so they can be immutable and non-nullable, which is great for optimizing functions that use them. When J2Wasm switched to this scheme (from their previous approach of initializing globals lazily, which sprinkled "`(if (global.get $g == null) (global.set $g (...)))" checks all over the module), they saw massive improvements to module size and performance (double-digit percentage, IIRC, but I don't have exact numbers on hand).
Since the actual act of executing constant global initialization is unobservable, this approach also gives engines freedom to potentially find ways to hide some of the work (on another thread, or in an idle time slot, or by being (partially?) lazy). FWIW, we aren't doing any of that in V8 yet, but I think having the option was nice.
In the absence of a string.const
instruction, one way to get string constants is to use data segments and array.new_data
and the string-from-array functions. Since the latter are imported, they aren't "constant", and spec'ing constant imported functions is probably infeasible, so global initializers are out with this approach. A possible workaround is to initialize all globals via the start
function (or, equivalently, some other function that's called before anything else happens). The entire transitive hull of globals that indirectly refer to strings (which probably means: approximately all globals) then needs to be mutable and nullable though. (It's possible that this mutability/nullability could be limited to certain properties; it's unclear to me how feasible that is and to what extent it would even help.)
Replacing an eager start
function with user-space lazy initialization is of course technically possible, but practically unattractive due to its code size and performance cost (see above).
The other way is to import the strings themselves as immutable globals. That maintains the ability to use global initializers for string-using globals and have them be immutable+nonnullable. The drawbacks are that (1) all strings must be eagerly created in JS before the Wasm module can be instantiated (so, again, no laziness; and to make it even worse, that eager code on the critical path is one-shot JavaScript), (2) the number of imports to process increases from hundreds to tens of thousands (and we may even have to raise the limit on allowed imports when sufficiently large apps start running into it), (3) toolchain support to create/optimize/manipulate these JS+Wasm bundles is yet to be implemented (and affects everything: e.g. merging modules will also have to merge their JS-provided string constants; dead-code elimination has to rewrite both files; etc).
I suspect that we will have to prototype both of these approaches in toolchains in order to determine which one is more viable (or even: whether either of them is viable at all).
I don't see how tables or immutable arrays would help much, as they don't change the fundamental equation; the only clear benefit they bring is reducing the number of individual imports (addressing drawback (2) above); but it sounds to me like the price they pay for that is that they require even more eager work to be done up front (the respective fromIterable
operation). If your intuition says otherwise, I encourage you to prove it with an A/B benchmarkable pair of prototypes :-)
Having a constant string.const instruction makes that easy and efficient. Globals that refer to strings can be initialized with initializers, so they can be immutable and non-nullable, which is great for optimizing functions that use them
Adding stringref
, string.const
and a string-constant-section
to core wasm to improve string creation performance is definitely a possibility. However, it's a big addition that impacts systems like embedded that may not even have a host string type. It would be very convenient to the advancement of this proposal to find a way to avoid this. That's why I'm trying to find other smaller tweaks we could pursue. If nothing works here, then we could consider something like this.
In the absence of a string.const instruction, one way to get string constants is to use data segments and array.new_data and the string-from-array functions. Since the latter are imported, they aren't "constant", and spec'ing constant imported functions is probably infeasible, so global initializers are out with this approach. A possible workaround is to initialize all globals via the start function (or, equivalently, some other function that's called before anything else happens). The entire transitive hull of globals that indirectly refer to strings (which probably means: approximately all globals) then needs to be mutable and nullable though. (It's possible that this mutability/nullability could be limited to certain properties; it's unclear to me how feasible that is and to what extent it would even help.)
So if I read this correctly, this option is to make all (or most) globals mutable, initialize them at start
, and use the 'string from an array from a data segment initialization path'. This definitely has the advantage of reducing the global import cost, but I'm unsure about the cost of the string creation. Like I said above, we really don't want to add that macro optimization in our baseline compiler and the start function is one-shot Wasm and a great candidate to be lazy and not use the optimizer on. The extra copy that would then happen seems unfortunate.
The other way is to import the strings themselves as immutable globals. That maintains the ability to use global initializers for string-using globals and have them be immutable+nonnullable. The drawbacks are that (1) all strings must be eagerly created in JS before the Wasm module can be instantiated (so, again, no laziness; and to make it even worse, that eager code on the critical path is one-shot JavaScript), (2) the number of imports to process increases from hundreds to tens of thousands (and we may even have to raise the limit on allowed imports when sufficiently large apps start running into it), (3) toolchain support to create/optimize/manipulate these JS+Wasm bundles is yet to be implemented (and affects everything: e.g. merging modules will also have to merge their JS-provided string constants; dead-code elimination has to rewrite both files; etc).
Agreed, importing each string as an immutable global seems like the per-global import cost would be prohibitive. That's why I was trying to think through an 'Option 3' where a module only imports a single immutable array which contains all the strings in the module, then the module's other global initializers could refer to that string array using a constant array.get operation.
Regarding 'laziness' of creating strings, what are you imagining? At least in SM in JS, we create the actual string objects when parsing the source (as far as I can tell). Does V8 do something differently?
it's a big addition
It's a tiny addition compared to GC. I don't expect any system that doesn't implement GC to implement GC'ed strings.
The extra copy that would then happen seems unfortunate.
Agreed. All options aside from a string.const
instruction incur unfortunate extra work.
we create the actual string objects when parsing the source (as far as I can tell). Does V8 do something differently?
V8 does that too: for string constants in JS source, it eagerly creates interned strings. So there's certainly a chance that we may, at least for now, get away with Wasm being like JS in this regard. I generally think it's desirable to design Wasm such that it can perform better than JS; otherwise what's the point of compiling to Wasm instead of compiling to JS?
it's a big addition
It's a tiny addition compared to GC. I don't expect any system that doesn't implement GC to implement GC'ed strings.
Compared to GC sure, compared to this proposal it's a big addition to cross into core wasm with a new valtype/section and raises the bar of general usefulness quite a bit. I would like to see this proposal move as fast as possible and avoid getting stuck in big core wasm discussions. Again, not ruling it out, but it seems expedient to look at other extensions first.
we create the actual string objects when parsing the source (as far as I can tell). Does V8 do something differently?
V8 does that too: for string constants in JS source, it eagerly creates interned strings. So there's certainly a chance that we may, at least for now, get away with Wasm being like JS in this regard. I generally think it's desirable to design Wasm such that it can perform better than JS; otherwise what's the point of compiling to Wasm instead of compiling to JS?
My goal with this proposal is for wasm to have parity (or very very close) with JS when it comes to JS strings. It'd be nice to have it perform better on JS strings, but that's not a hard requirement.
I finally got around to prototyping and benchmarking this (source here).
I tested out four strategies:
array.new_data
and JS string builtin to create the string and store in a private globalAll of these for a list of 100,000 random (generally short) strings.
Testing on my machines I got the following:
import_global = 45.41 ms (variance = 448.47)
data_segment = 232.70 ms (variance = 9.95)
array_from_iterable = 13.76 ms (variance = 0.02)
string_section = 13.28 ms (variance = 0.07)
These numbers are for the full script to parse and execute and include synchronous wasm compilation and instantiation (using our baseline compiler).
For uncompressed file sizes, I get:
import_global = 2.1 MiB
data_segment = 5.5 MiB
array_from_iterable = .8 MiB
string_section = .6 MiB
The data segments one is an outlier in runtime, and that's mostly due to the compile time for the big 'init' function which does array.new_data -> stringFromCharCodeArray -> global.set
sequence for each string. That could be mitigated with asynchronous compilation and caching, but even with our baseline took a long time. The file size is also pretty bad. I could possibly optimize it further by storing the final strings in a private table instead of globals.
When it comes to the string section, it doesn't seem to buy anything for runtime. I took the strategy of allocating the strings and storing them on the instance upon instantiation. It does seem like the string section is more compact, but I haven't measured compressed sizes so I don't know.
As of right now, it seems like array_from_iterable is the sweet spot of minimal spec complexity and performance.
Thanks for the data @eqrion !
I think we may want to add something to the code size and startup time for "array from iterable" to make the comparison fully apples-to-apples. Atm that binary is
(module
(type $strings (array externref))
(import "" "strings" (global $gimport$0 (ref $strings)))
)
which means we have an array of strings. To access an individual string we'd need global.get + array.get + i32.const
. In comparison, "imported globals" has
(import "str" "0" (global $gimport$0 externref))
(import "str" "1" (global $gimport$1 externref))
(import "str" "2" (global $gimport$2 externref))
...
which means the strings are already separated, and accessing an individual string costs only a global.get
. That is, some of the extra code size and startup time in "imported globals" is work that "array from iterable" will need to do as well (in order to actually use individual strings), but does not seem to be accounted for here. I'd guess at least the size of array.get + i32.const
times the number of strings, but perhaps a little more as if a string is used more than once we'd likely want to cache it in a global.
Apologies if I have misunderstood something in the code or explanation, which is definitely possible!
Thanks for the data @eqrion !
I think we may want to add something to the code size and startup time for "array from iterable" to make the comparison fully apples-to-apples. Atm that binary is
Yeah, this is where comparing things gets hard. I was expecting that a toolchain using the 'array from iterable' approach would skip creating a global per string and do the array access scheme you mentioned. It's at the cost of a bounds check and indirection per use of the constant, but it avoids the cost of creating/initializing the globals.
That is, some of the extra code size and startup time in "imported globals" is work that "array from iterable" will need to do as well (in order to actually use individual strings), but does not seem to be accounted for here
The extra code size for 'imported globals' (compared to 'array from iterable') is from the import statements (otherwise they're pretty similar), and the extra startup time comes from imports not scaling well (requires a full JS property lookup per import). I don't think think those are things that 'array from iterable' will necessarily need to do (although I may be wrong and accessing from global instead of array may be required).
I guess I could add a separately timed stage to the benchmarks that accesses each string constant once to capture the cost that each scheme implies? e.g. global.get vs array.get vs string.const. Would that be what you're interested in?
Yeah, this is where comparing things gets hard. I was expecting that a toolchain using the 'array from iterable' approach would skip creating a global per string and do the array access scheme you mentioned. It's at the cost of a bounds check and indirection per use of the constant, but it avoids the cost of creating/initializing the globals.
Definitely, yeah, there are multiple possible tradeoffs here, and comparisons get hard.
Though my concern about "array from iterable" isn't just runtime speed but also size - each use of a string gets larger. Enough appearances of a particular string constant and the binary could be bigger, maybe even big enough to offset any startup benefit to not creating those globals. But fwiw I did a measurement on a large Java testcase I have and most strings appear only once, so at least there "array from iterable" would be smaller.
I guess I could add a separately timed stage to the benchmarks that accesses each string constant once to capture the cost that each scheme implies? e.g. global.get vs array.get vs string.const. Would that be what you're interested in?
Yes, I think that's worth doing. Though maybe be careful not to put it all in a single big function - my worry is that this is one of those costs that gets "spread" around an entire codebase, while in a single big function the VM might do things like store the array in a local after the first load which would hide the true global.get
cost on real-world code.
Also an option is to involve Java/Dart/Kotlin people in this discussion to get a more realistic picture of how they use strings atm. But just to try to present the point of view I've seen from the Java side in my work with them, we've worked very hard together (J2Wasm+Binaryen) to avoid runtime overheads that get spread around the codebase, by focusing on tweaking things like how itable accesses are done and trying to avoid every possible load there. So even an extra array.get
per string access sounds risky to me. As background, in the Sheets use case mentioned in the V8 WasmGC blogpost, startup definitely matters, but I believe performance afterwards matters a lot more, so one global per string is likely the way to go there at least.
Btw, another option might be "struct from iterable", just replacing the array with a struct. We do know the number of fields statically per compilation, so that is possible, and while it would add some type section size, it would then avoid an i32.const
byte by using an immediate, and also there would be no bounds check.
Btw, another option might be "struct from iterable", just replacing the array with a struct. We do know the number of fields statically per compilation, so that is possible, and while it would add some type section size, it would then avoid an
i32.const
byte by using an immediate, and also there would be no bounds check.
That would be a nice solution, but I think we have an implementation limit of 10k struct fields which from the above discussion seems too small for string constants. Part of the motivation for using arrays instead of globals was the 100k import limit.
I'll try to get some data on the cost of accessing each string constant once for all of the different approaches.
Btw, another option might be "struct from iterable", just replacing the array with a struct. We do know the number of fields statically per compilation, so that is possible, and while it would add some type section size, it would then avoid an
i32.const
byte by using an immediate, and also there would be no bounds check.That would be a nice solution, but I think we have an implementation limit of 10k struct fields which from the above discussion seems too small for string constants. Part of the motivation for using arrays instead of globals was the 100k import limit.
I suppose if toolchains were open to splitting their string data into multiple structs as needed, it would scale well. It would be some more complexity for toolchains though. We also probably could increase the struct field size limit to 100k with a concrete use-case.
Here's some quick data on adding a single use for every string constant to compare overheads. I did put all of the uses into a single function. However, I'm only testing with our baseline compiler so there will not be any optimizations that re-use array pointers, bounds checks, dead code elimination, etc. It should be basically equivalent to if they were spread out in different functions.
import_globals: 60.2ms -> 77.9ms
data_segments: 225ms -> 223.4ms (within noise during 200 iterations)
import_array: 13.7ms -> 33.9ms
The before/after numbers show the effect of compiling and running a function that loads every string for that strategy.
I don't have more data for the string section approach, as I haven't had time to implement a string.const instruction as well. I would guess it would be roughly the same cost as a single global.get
as it would be implemented almost identically. So somewhere between the import_globals and data_segments seems like a likely cost.
From looking at some profiles, I do believe we could speed up the imported globals approach. Most of the extra time there is spent in parsing the imports in the binary, then doing slow generic JS property lookups on the imports object. I think we could add a special case for using numeric indices on a dense JS array that bypasses a lot of that generic code.
However, that approach is still limited to 100_000 imports as said above.
I'm warming up to the idea of a structFromIterable
operation because that seems like it could be faster than the imported array approach, and also be easier to use in constant expressions (array.get has bounds checks that can trap).
@kripken For the J2CL use-case, how many strings are you seeing right now, and what would be a reasonable maximum limit? Would supporting up to 100k strings in a struct work for you? If you needed more, would splitting into several structs be a large burden.
@bashor Would you have a concern about having a 100k string constant limit?
@eqrion I see 73K strings in a large Java testcase. But I can't say if that is a reasonable limit, other codebases may well need more...
import_globals: 60.2ms -> 77.9ms
data_segments: 225ms -> 223.4ms (within noise during 200 iterations)
import_array: 13.7ms -> 33.9ms
The first and last lines seem to say that the imported array method has less startup overhead but a little more runtime overhead. The middle one though confuses me: how can data_segments
have ~0 (within noise) runtime overhead for string accesses? Am I misreading that line?
The first and last lines seem to say that the imported array method has less startup overhead but a little more runtime overhead. The middle one though confuses me: how can
data_segments
have ~0 (within noise) runtime overhead for string accesses? Am I misreading that line?
I was confused by that as well. I spent a good amount of time verifying that I didn't mess it that one up originally and couldn't find anything. I'll try to swing back and look at that again.
@bashor Would you have a concern about having a 100k string constant limit?
We'll check it on some projects and come back to you.
@eqrion so far, the biggest number of string literals we've seen in real applications is ~14K.
It's hard to predict if 100K is enough, and for how long. I think the current number could be easily doubled within a year.
I'm wondering how strict these limits are, and if they could be increased in the future.
I'm wondering how strict these limits are, and if they could be increased in the future.
It depends on which approach is taken:
With (2) I think it would be reasonable for implementations to increase their max struct size to 100,000 fields to support this use case. Going above that may get tricky, but 200,000 may be possible. I don't think engines want to dramatically increase the limit on number of imports.
Even if we had a limit of 100,000 fields for structs, it would still be possible for toolchains to split their string constants into multiple structs, and import multiple of them. It just is less convenient.
Yes, I think toolchains can fairly easily emit multiple structs here to stay under the 10,000 field limit. (This does require a whole-program decision, but even emitting a single struct requires that anyhow, so I don't see it as adding much work.)
For Google Sheets, when we switched to string-builtins spec, our production binaries got 442KB larger for around total of 14k string constants - that's extra ~5%. I'm estimating around 80-90KB of the regression is due to less efficient encoding of the constants as JSON. Every single constant also adds around an extra 25 bytes on average to declare the imports in the wasm module.
re. offload string constants to be part of JavaScript load: IMO it should be part of the design goal here to make Wasm binaries self contained. It is not good to add an extra large chunk of code to be downloaded/processed in initial JS load; it is also not good to try to late load those JS constants and add an extra roundtrip around Wasm initialization.
re. number of constants: For Google Sheets, I have seen tests with 112K string constants.
@gkdn, can you share how the compressed size changes as well?
When compressed, regression is 38kb / ~3.5%
Thanks for trying this out! Just so I understand, you're using defining all of your strings in JSON in JS, then importing them as globals in your wasm module, right?
For Google Sheets, when we switched to string-builtins spec, our production binaries got 442KB larger for around total of 14k string constants - that's extra ~5%. I'm estimating around 80-90KB of the regression is due to less efficient encoding of the constants as JSON. Every single constant also adds around an extra 25 bytes on average to declare the imports in the wasm module.
90KB / 14k = 6.6 bytes of overhead per string for using JSON. I would've expect it be closer to 3 bytes overhead per string for the two quote chars and comma char. Where are the other 3.6 bytes coming from? Escaping control characters?
re. offload string constants to be part of JavaScript load: IMO it should be part of the design goal here to make Wasm binaries self contained.
Embedding string constants into a wasm binary (using a string section) is difficult to do without creating a definition of strings for core wasm that affects other non-web platforms. And avoiding impact on non-web platforms is a key design goal here.
I think there might be a way to do it if we defined a new kind of 'web section' in the wasm binary that only affects the web platform. It would wrap the core inner module and modify the imports/exports, acting as a higher layer above the core wasm spec. A similar idea to this was explored for pre-component-model interface types. This would be a lot of work, but possible.
However, I did do some benchmarking in an earlier comment of alternative approaches (https://github.com/WebAssembly/js-string-builtins/issues/13#issuecomment-1972134581) that avoid the standards issue. From that investigation, it seemed like the structFromIterable
/arrayFromIterable
approaches could be just as fast as a string constant section (with possibly a slight size regression). The basic idea is to add a JS-API function which takes some JS iterable and converts it to a wasm struct or array that you can import and then access the string constants off of that.
It is not good to add an extra large chunk of code to be downloaded/processed in initial JS load; it is also not good to try to late load those JS constants and add an extra roundtrip around Wasm initialization.
I understand this concern, but I wasn't seeing this when benchmarking. And I was trying to include the total time of script parsing/loading to capture the concern that it would delay startup to have string constants on the hot path. It seems like JS engines already have highly optimized string constant and JSON parsing code that can be just as efficient as a string section.
But this is just from a microbenchmark, if you have data that indicates otherwise, that would be really useful to know.
90KB / 14k = 6.6 bytes of overhead per string for using JSON. I would've expect it be closer to 3 bytes overhead per string for the two quote chars and comma char. Where are the other 3.6 bytes coming from? Escaping control characters?
I can't speak to the exact numbers, but right now the JSON encoding escapes everything outside of ASCII range, so there is room for improvement: https://github.com/WebAssembly/binaryen/blob/35df5043764b41d7c23d15db5518cfe8becb3564/src/support/string.cpp#L357-L386
Thanks for trying this out! Just so I understand, you're using defining all of your strings in JSON in JS, then importing them as globals in your wasm module, right?
Strings are defined in JSON and encoded into a custom section to self-contain the Wasm binary. Then on runtime it is Text-decoded+JSON.parsed and imported as globals.
90KB / 14k = 6.6 bytes of overhead per string for using JSON. I would've expect it be closer to 3 bytes overhead per string for the two quote chars and comma char. Where are the other 3.6 bytes coming from? Escaping control characters?
90KB was a quick upper-bound estimate that I did earlier based on the string length. Let me expand with more accurate numbers:
The size of the custom section is ~280KB. Encoded size of whole string as a single joined string is ~200KB. There is 80KB difference but there still needs to some extra bytes to encode start/end of different strings. If that takes around extra 20KB, the encoding overhead becomes around ~60KB.
For the amount due to escaping: JSON quote/comma is around 50KB vs. 20KB assumed non-JSON scenario; so probably around 30Kb is due to escaping.
Embedding string constants into a wasm binary (using a string section) is difficult to do without creating a definition of strings for core wasm that affects other non-web platforms. And avoiding impact on non-web platforms is a key design goal here.
Too clarify what I mean with "self-contained": the custom section approach we use today fulfills this aspect but has other problems around code size / performance.
I think there might be a way to do it if we defined a new kind of 'web section' in the wasm binary that only affects the web platform. It would wrap the core inner module and modify the imports/exports, acting as a higher layer above the core wasm spec. A similar idea to this was explored for pre-component-model interface types. This would be a lot of work, but possible.
Sounds nice.
However, I did do some benchmarking in an earlier comment of alternative approaches (#13 (comment)) that avoid the standards issue. From that investigation, it seemed like the
structFromIterable
/arrayFromIterable
approaches could be just as fast as a string constant section (with possibly a slight size regression). The basic idea is to add a JS-API function which takes some JS iterable and converts it to a wasm struct or array that you can import and then access the string constants off of that.
I'm assuming those also require struct.get
/array.get
to be constant instructions. If it is fast and regression is very small that could work.
I understand this concern, but I wasn't seeing this when benchmarking. And I was trying to include the total time of script parsing/loading to capture the concern that it would delay startup to have string constants on the hot path. It seems like JS engines already have highly optimized string constant and JSON parsing code that can be just as efficient as a string section.
But this is just from a microbenchmark, if you have data that indicates otherwise, that would be really useful to know.
Unfortunately I don't have comparative numbers but probably around 20-25ms to decode/parse on average desktop machine.
But note that this concern is not only parsing time on our desktops but it includes download/parsing/execution time for JS on low end mobile devices and under low bandwidth conditions.
To put this into perspective, let's consider an app like Google Docs with hypothetical Wasm migration; The initial readonly view of app would in JS. We can include small amount of JS to fulfill Wasm imports needs however the product teams wouldn't accept a 300KB regression to load Wasm strings on that script where we count every single byte. Alternative would be doing another chunk of JavaScript for the strings/imports but then we would need to download and load it separately from Wasm module and adding extra step/roundtrip to make the document editable and would regress on those metrics.
Unfortunately I don't have comparative numbers but probably around 20-25ms to decode/parse on average desktop machine.
But note that this concern is not only parsing time on our desktops but it includes download/parsing/execution time for JS on low end mobile devices and under low bandwidth conditions.
To put this into perspective, let's consider an app like Google Docs with hypothetical Wasm migration; The initial readonly view of app would in JS. We can include small amount of JS to fulfill Wasm imports needs however the product teams wouldn't accept a 300KB regression to load Wasm strings on that script where we count every single byte. Alternative would be doing another chunk of JavaScript for the strings/imports but then we would need to download and load it separately from Wasm module and adding extra step/roundtrip to make the document editable and would regress on those metrics.
Hmm, I'm not sure I understand your constraints here. If I understand right, you don't want to have the string constants in your initial JS that renders a read only view of the page, and that makes sense to me. Does that initial JS then do a dynamic fetch of the wasm module? Would it be possible for that dynamic fetch to do a fetch in parallel of the string constants? Or does it really just need to be a single fetch for everything else.
I'm imagining something like:
let modulePromise = WebAssembly.compileStreaming(fetch('mod.wasm'));
let stringsPromise = fetch('strings.json').then((x) => x.json());
let [module, strings] = await Promise.all([modulePromise, stringsPromise]);
let instance = ...
I'd hope that performing the two fetches in parallel would be just about as fast as doing a single one. But I could be wrong.
Regarding the concerns about JSON size overhead, it could be possible to define a custom section for 'js-strings' that uses a more efficient binary encoding. Modules that contain the custom section would have a .strings()
property on the Module object that returns a JS array of strings parsed. Those could then be imported by the real module. Or it could possibly return a wasm struct or array directly and skip the creation of a JS array. This works around the restriction on custom sections from affecting core wasm semantics by only being accessible from the JS-API.
re. parallelization: I don't know the implications of parallelization, maybe that is as fast; there is also caching scenarios to think about - but generally these are not my area of expertise. We currently abstract Wasm as a single artifact where apps doesn't know about "imported string" details. Not super exciting to be honest; I hope it doesn't come to that.
I've been mulling over this. Some observations:
Most of the ideas discussed so far use some form of custom section to store string constants. That's nice because it keeps the Wasm module self-contained, but kind of goes against the original spirit of custom sections being non-load-bearing. Can we do something that doesn't (ab)use custom sections?
Most of the ideas discussed so far involve various forms (and chains) of conversions: TextDecoder.decode
the custom section, JSON.parse
the result of that, then structFromIterable
it into a Wasm object, etc. Perhaps we can find clever ways to let engines optimize away some of these conversions, but when JS is involved with its ubiquitous observability, that quickly becomes hard or impossible. Can we do something that doesn't rely on multi-hop conversions?
Most of the ideas discussed so far involve a bunch of work that must happen eagerly before the Wasm module can be instantiated (which is largely a consequence of requiring chains of conversions). Even if 30ms on a fast desktop CPU don't sound like much, there is a lot of slow hardware in the world, and generally it's unfortunate when specifications demand work to be performed eagerly. Can we do something that minimizes the number of CPU cycles spent on initialization, and ideally allows lazy allocation of the actual on-heap strings?
Most of the ideas discussed so far increase the binary size of the Wasm module. One pattern how that happens is by having the custom section have an entry #12345, and then having an import that needs to indicate that it wants to receive the processed contents of that entry #12345. The array-based approach avoids the need to define many imports, but pays for that by making accesses to string constants larger to encode and more expensive to execute. Can we do something that's as module size efficient as a string constants section?
Most of the ideas discussed so far require some (minor) additions to core Wasm, such as additional constant instructions (such as array.get
and/or struct.get
and/or table.get
). If we're lucky, such suggestions will not be controversial, but can we avoid the risk by doing something that doesn't require changes to core Wasm?
Can we do something that checks all of these boxes at once?
So I was pleased to read this sketch by @eqrion:
it could be possible to define a custom section for 'js-strings' that uses a more efficient binary encoding. Modules that contain the custom section would have a .strings() property on the Module object that returns a JS array of strings parsed. Those could then be imported by the real module. Or it could possibly return a wasm struct or array directly and skip the creation of a JS array.
That sounds promising: +1 to finding a mechanism that skips avoidable detours and is reasonably easy to implement efficiently. Let's take it one step further: recall that this proposal's key mechanism is an opt-in to "magic" imports, and we can use that same mechanism for getting efficient constants!
Consider a fourth flag for the list of builtins:
WebAssembly.compile(bytes, { builtins: [..., "string-constants"] });
(Bikeshedding opportunity: we could merge that flag into "js-string"
, I don't think there's much reason to have one without the other, but we can also keep them separate for "cleanliness".)
When this flag is present, then imports of the form:
(global $string0 (import "\"" "Hello, world!") externref)
i.e. having type externref
and module name "
, will be populated by the engine with a string constant whose payload is the import's field name (in the example: "Hello, world!"
).
(Bikeshedding opportunity: we could pick a module name other than "
. As long as we have to repeat it for every import, it makes sense to keep it as short as possible, and "
seems appropriate for strings. But any other character, or even the empty string, would technically work just as well.)
Notably, just like the importable functions specified in this proposal, this is fully and easily polyfillable for engines that don't support it directly, with a small snippet of module-independent code. There are several viable approaches for that:
Proxy
to satisfy the string constants imports, which creates the required strings on demand.Module size should be just slightly larger than the bar set by the stringref proposal. Compared to that, we need 5 extra bytes for each import (2 for the module name and its length, 3 for "global immutable externref"), whereas an access to a string constant shrinks by 2 bytes (string.const $idx
takes 3 bytes plus an LEB, global.get $idx
takes 1 byte plus the same LEB). (Future changes to the binary encoding of imports could reduce this overhead.)
Efficiency is as good as it gets: engines can allocate strings straight from module wire bytes, without having to teach their baseline compilers how to avoid indirections through JavaScript for run-once code, or performing a JS object property lookup for each string constant. They can even choose to make all initialization of all immutable globals lazy, if they want. (I'm not sure how relevant that is, but it's certainly nice to have the option.)
This approach needs zero additions/modifications to core Wasm. It also doesn't introduce any new concepts to this proposal (just applies the idea of engine-satisfied imports to one more use case). It has no risk of backwards incompatibility, because all new behavior is opt-in. And it keeps the Wasm module self-contained, making tools' and toolchains' lives easy.
The drawbacks of the approach are minor, I think:
WDYT?
I think that sounds promising, and would be pretty straightforward to implement.
Regarding limits, I think we could probably increase the import limit to 200k without controversy. Maybe more if really need be. Looking into the import name approach, I did notice that SM has a limit on the size of a name to 100,000 bytes. This is allowed by the core spec but not specified by the js-api spec. We can probably raise it too if that's an issue. Luke's suggestion of a core wasm string section for commoning up inline strings in the binary format could also be useful here, especially if it could get long strings to go behind the code section in the binary, which would allow streaming compilation to start quicker. That would be a backwards compat addition in the future too, not required up front.
As a minor bikeshed, a feature like this may make sense to be a different kind of flag than a builtin collection, something like: { builtins: [..], importedStringConstants: true }
. Just because they act a bit different and would probably be spec'ed differently too.
I don't have a good sense on whether UTF-8 only is a deal breaker for some toolchains, it does seem like they could fallback to another path for the rare strings that need unpaired surrogates. But curious if anyone disagrees.
I also think this is a promising idea, with two notes:
I would very much like us to adopt a compact import section format (discussion at https://github.com/WebAssembly/design/issues/1514) so we can give a meaningful import module name to the string constants without code size bloat.
I wonder if there would be any appetite to relax the restrictions on names to either allow WTF-8 (i.e. allow unpaired surrogates but not surrogate sequences) or to just allow surrogates without restriction. Either would avoid needing a fallback to other mechanisms for importing strings that are not valid UTF-16.
I also think this is a promising idea, with two notes:
1. I would very much like us to adopt a compact import section format (discussion at [Compact import section format design#1514](https://github.com/WebAssembly/design/issues/1514)) so we can give a meaningful import module name to the string constants without code size bloat.
I would be interested in this. If you write something up, I think it could get added as part of this proposal. I probably won't have time to do it myself in the next month, so any help is welcome.
2. I wonder if there would be any appetite to relax the restrictions on names to either allow WTF-8 (i.e. allow unpaired surrogates but not surrogate sequences) or to just allow surrogates without restriction. Either would avoid needing a fallback to other mechanisms for importing strings that are not valid UTF-16.
I personally would prefer to avoid this unless really necessary. WTF-8, as far as I know, is not a standard in the sense of being adopted by a standards organization/body, which makes it inconvenient to depend on from the core wasm spec.
For dart2wasm's use case of js-strings, @jakobkummerow's suggestion sounds really good to me.
As an example of how much the .js shim grows with strings, currently in this app with a lot of strings, the generated JS shim (with string literals) is 546,872 bytes, of which 518,837 bytes are for the strings (94%).
@jakobkummerow's suggestion solves this problem, and I don't see any downsides for dart2wasm as long as number of imports and the name size limit is large enough.
@osa1 Does Dart disallow invalid UTF-16 (like unmatched surrogate pairs)?
The Scala compiler allows unpaired surrogate chars in string literals. However, from my experience, actual occurrences of such strings are exceedingly rare in practice (some tools surrounding the compiler, such as the code formatter, even choke on them). As long as there is some workaround available, even if not very performant, I believe it would be fine to restrict string constants to valid Unicode scalar value sequences.
(I deleted my original comment as I was wrong)
Dart also allows unpaired surrogate pairs in string literals. Example:
void main() {
final s1 = "\u{D83D}";
final s2 = String.fromCharCode(0xDE00);
print(s1 + s2); // prints 😄
}
I also think these cases should be extremely rare, and we have the workaround of importing such strings in the .mjs file.
Implementation feedback:
I. Design question: compile
or instantiate
?
There is some design wiggle room around the time when the constants are dealt with. Specifically, this could conceptually be part of WebAssembly.compile
(like the functions in this proposal), or part of WebAssembly.instantiate
.
Reasons in favor of WebAssembly.compile
:
compile
step)."
module must be globals of type immutable externref
). Not sure how many such modules there are.Reasons in favor of WebAssembly.instantiate
:
builtins
list but something separate -- it could also be consumed by a different phase.)compile
time.Personally I have no strong opinion on this, the question just occurred to me when I started implementing these constants. Both options are implementable.
II. Choice of the magic module name
Having a single-character name for the recognized imported module isn't just binary-size efficient, it is also convenient for implementations, because it's fast and simple to check (something like module_name.length() == 1 && module_name.chars()[0] == kMagicChar
). If we had a compact import section format, this might be alleviated by engines switching to two-level internal storage as well, which would eliminate the need to check every single import's module name. I'm not sure how likely either of these things are (a new binary format might not find CG agreement, and implementations might find a flat list of imports more convenient).
We could pick '
instead of "
to avoid the unsightly backslash in the text format. Personally, I don't think (import "string-constant" "Hello, world!")
is tangibly better than (import "'" "Hello, world!")
, so I wouldn't want to pay any non-negligible price for the more verbose form.
III. Implementation effort
Even easier than I realized previously, because import names are already converted to JS strings anyway, in order to perform the property lookups on the imports object, so replacing those lookups is trivially easy. Plumbing the flag from compile
(see I.) to instantiate
adds some boilerplate, but overall my patch is still at just ~70 lines.
There's some room for performance improvements now that we expect to have tens of thousands of imports, which leads to more effort, but that would apply to any of the approaches that define one global per string constant.
Implementation feedback:
I. Design question:
compile
orinstantiate
? There is some design wiggle room around the time when the constants are dealt with. Specifically, this could conceptually be part ofWebAssembly.compile
(like the functions in this proposal), or part ofWebAssembly.instantiate
.
I think I have a slight preference for the flag being part of the compile phase. The reason would be that it allows us to know during eager compilation that while the global is 'externref' it will only ever be a non-null string and we could use that to eliminate some casts when we see the global used.
But also, it's not a strong preference. I do have a strong preference for this being a separate kind of flag though.
II. Choice of the magic module name Having a single-character name for the recognized imported module isn't just binary-size efficient, it is also convenient for implementations, because it's fast and simple to check (something like
module_name.length() == 1 && module_name.chars()[0] == kMagicChar
). If we had a compact import section format, this might be alleviated by engines switching to two-level internal storage as well, which would eliminate the need to check every single import's module name. I'm not sure how likely either of these things are (a new binary format might not find CG agreement, and implementations might find a flat list of imports more convenient). We could pick'
instead of"
to avoid the unsightly backslash in the text format. Personally, I don't think(import "string-constant" "Hello, world!")
is tangibly better than(import "'" "Hello, world!")
, so I wouldn't want to pay any non-negligible price for the more verbose form.
I think that if we can't get any agreement on a compact import section format that using a single character might be okay, but I think I'd want to measure it first. I think a compact import section could let us optimize string imports very well by switching to a tight loop that just allocates strings and puts them into their final storage.
I'm landing my patch in V8, to make it easier for our toolchain partners to experiment with this approach. Of course it's still part of the off-by-default --experimental-wasm-imported-strings
feature.
For now, it uses {builtins: [...], importedStringConstants: true}
as the opt-in flag, and requires the "module name" for string constants to be '
(i.e. a single-quote character). It's easy to change these details later if necessary.
I'm landing my patch in V8, to make it easier for our toolchain partners to experiment with this approach. Of course it's still part of the off-by-default
--experimental-wasm-imported-strings
feature.For now, it uses
{builtins: [...], importedStringConstants: true}
as the opt-in flag, and requires the "module name" for string constants to be'
(i.e. a single-quote character). It's easy to change these details later if necessary.
Cool, I hope to have a quick PR up for this soon too. I'll just use the same module name as well for now.
I've now adopted the imported string constants approach described above in the overview. I will leave this issue open for a while in case this approach needs refinement or we want to look at another approach.
Collected some data from Google Sheets:
Custom section: 4326288 (compressed 1137269)
Magic-import: 4048402 (compressed 1113046)
string.const: 3895540 (compressed 1103698)
The regression over string.const is likely coming from:
I'm going to close this as I believe we have an acceptable solution now.
I'd like to reopen the discussion a little bit - remember in the in-person meeting, I asked whether a modified scheme like the following would be possible:
Now I realise that there's a specific advantage of this alternative scheme: the custom section can use UTF16/WTF16 encoding. In the current scheme where strings come from the import names directly, a reencoding step seems needed to go from UTF8 -> WTF16, which would need a copy. Also, since UTF8 can't represent all WTF16 strings, producers will need a fallback. If the custom section in the alternative scheme directly encodes UTF16/WTF16 strings, there's no reencoding step or producer gap (with WTF16), and engines could even in principle directly use the allocated space of the custom section as the backing store for the relevant string.
I realise that custom section based designs have been discussed above, but they seem to involve host code parsing the custom section and feeding it back into the module's imports, rather than combining with the "compile time flag" idea that's been newly introduced.
@conrad-watt nice idea, two comments:
(1) Wire bytes size will be larger. The current approach needs (on top of the import's "module name", whatever that'll end up being) the length of the string and the string payload itself (as utf-8). Your idea needs the length of the index string, the index string, the length of the actual string (assuming a vector-of-vectors encoding for the custom section like existing similar sections), and the string payload itself; so depending on how exactly the indices are encoded, that'll probably add around 3-6 bytes per string constant.
(2) I once tried to implement string constants in wire bytes to be used as backing stores for the strings created from them, but had to give up on the attempt because sections aren't two-byte aligned, so for the time being I consider this an entirely hypothetical benefit that won't actually materialize in practice. (In many common languages, utf-8 is also much shorter than wtf-16, so module producers may well opt against wtf-16 for that reason if given a choice.)
Here's another idea for doing string constants that uses a table instead of globals. This can avoid the implementation limit of 1,000,000 globals we have, and possibly be more efficient.
table.get
to get the strings it wants. (minimum size can avoid bounds checks)Table.copyFrom() would be a new JS-API method on tables that takes a JS iterable object and copies it's values into itself starting at a specific index.
A variation of this could also be that the Table JS-API constructor takes an iterable and uses that for the initial values of objects. This could possibly allow the table type to be
(ref extern)
.One problem is that toolchains also want to refer to string constants in global initializers when constructing static objects. Another extension could be to allow
table.get
as a constant expression (table section comes before global section). This might be a little tricky, becausetable.get
can trap if the index is out of bounds. At least for SM, we already have to handle instantiation failure due to OOMs in initializer expressions, so this may not be that problematic?@jakobkummerow WDYT? cc @rossberg if they have any opinions about allowing
table.get
as a constant expression in global initializers.