Fix ordering between memory writes and effects

wingo commented 2 years ago

Consider string.encode_wtf8. If it attempts to write beyond the bounds of current memory, it should trap. But should any partial memory writes be visible after the trap? Like if you encoded "hello" out of "hello, world", but trapped when writing comma. Should "hello" appear in memory?

Similarly, when encoding UTF-8 (rather than WTF-8) on a string containing an isolated surrogate, should the trap occur before any byte is written?

The affected instructions are string.encode_wtf8, string.encode_wtf16, stringview_wtf8.encode, and stringview_wtf16.encode.

I think I am leaning towards "traps before writes", and conversely, "if there are writes then there are no traps".

jakobkummerow commented 2 years ago

Does this have performance implications? IOW, does the requirement to trap before writing the first byte necessitate an a priori exact determination of the number of bytes that will be required that could otherwise be avoided?

Pauan commented 2 years ago

@jakobkummerow Depending on the host's internal string implementation, doing a WTF-8 length check is an O(n) operation. Similarly, checking for lone surrogates could be an O(n) operation.

So the host would have to first loop over the entire string, check its length, then loop over it again and do the write. So yes, it could have performance implications.

There are ways to mitigate those performance problems, but those mitigations have costs as well.

jakobkummerow commented 2 years ago

@Pauan Right, so my question is whether it is true that a "write xor trap" policy requires this O(n) check up front, and a "write until failure" policy would avoid it, or whether that assessment is overlooking something.

Assuming it's correct, the obvious follow-up question would be whether the conceptual cleanliness of "write xor trap" is worth incurring this cost. Presumably, the trapping scenario is rare, and it would be sad to slow down the far more common non-trapping situation.

Pauan commented 2 years ago

whether it is true that a "write xor trap" policy requires this O(n) check up front, and a "write until failure" policy would avoid it

It depends on the implementation. It is possible for an implementation to store the WTF-8 length in the strings (even if the string isn't WTF-8). It's also possible for the implementation to store a bit which says whether the string is valid UTF-8 or not (e.g. lone surrogates).

That has extra memory costs for every string, and it also adds a lot of complexity (the implementation has to update the length/valid bit when slicing strings, concatenating strings, etc.)

I'm not a browser developer, so I can't speak for them on how easy or hard it would be to implement that. But I assume it's a non-trivial cost.

A "write until failure" approach is guaranteed to be fast, whereas a "fail before write" policy might be fast or might be slow (depending on the implementation).

Personally, I'm fine with "write until failure", but that's just my personal opinion.

wingo commented 2 years ago

From the cost side, there are some easy ways to know that you won't trap. For example when encoding a string with codepoints in [0,0xff] to WTF-8 from a WTF-16 internal representation, if you have at least string.length * 2 bytes left, you won't trap. You might only need to do the WTF-8 length computation in some cases. For the strict UTF-8 case, though, you might need to scan the codepoints for isolated surrogates when converting from WTF-16 string that can contain isolated surrogates, unless there is a usv-string flag on the string.

From the spec side -- my feeling is that we should specify the order of effects. Write until failure is fine, trap before any write is fine, but we should specify the ordering. Or leave it explicitly unspecified if there is a performance concern! But we should be explicit.

Pauan commented 2 years ago

You might only need to do the WTF-8 length computation in some cases.

Yes, which means the performance will be inconsistent and unpredictable. That goes against earlier design principles of Wasm. So could you explain more about why you want trap-before-write? Does it offer some significant advantages compared to write-until-trap?

wingo commented 2 years ago

I think that when it comes to externally-managed data such as stringrefs, we are not in a precisely predictable world. A garbage collector can pause your program, or slow it down, in concurrent marking phases. The engine could decide to flatten a rope string, causing allocation and possibly GC. Substrings may hold on to their parents. And so on. In that context I think concern over whether you trap early or late is in the noise.

More generally, I think when it comes to implementing GC and related proposals, the "inlined fast path with callout to the run-time on slow path" pattern is unavoidable in practice.

As far as why you would want to trap beforehand -- it can hoist some conditionals out of loops. This can enable vectorization or open-coding in more cases.

wingo commented 2 years ago

More things to fix: ordering between alignment constraints for wtf-16 encoding and out-of-bounds constraints. Ensuring (or not) that traps happen even if no bytes would be written (the length to encode is 0).

eqrion commented 2 years ago

I'd lean towards 'trap-before-write' semantics here to align with the behavior of memory.copy/fill.

There was an extensive discussion around this in WebAssembly/bulk-memory-operations#111 that decided on this behavior from the previous bytewise semantics. The main reason for the change was that it allowed vectorized inline copies while still giving deterministic behavior in the event of a trap. I don't know if that will apply here, but keeping consistency here until a performance argument is made in the other direction would be nice.

Pauan commented 2 years ago

@eqrion Note that there is a big difference between memory.copy/fill and string.encode: with memory.copy/fill, the size is known ahead of time, and so it is cheap to do a bounds check.

But with string.encode the size is not known, which means trap-before-write must do an additional O(n) check to determine what the size is. This extra check is not needed with write-until-trap.

The closest existing API to string.encode is TextEncoder.encodeInto, and encodeInto has write-until-trap behavior. Technically it doesn't actually trap, it just returns the number of bytes read / written, and leaves it up to the user to handle partial data writes (e.g. by reallocating).

Because existing Wasm code heavily uses TextEncoder.encodeInto for handling strings, it would be unfortunate if the native string.encode API had different (slower) behavior. Especially because strings are currently the slowest part in Wasm<->JS communication, making them even slower with trap-before-write isn't great.

WebAssembly / stringref

Fix ordering between memory writes and effects #16