Open wingo opened 2 years ago
IIUC, the complication here in comparison to memory.fill
would be the potential for a racing thread to observe the unsanitised surrogate before it's overwritten. We could write the specification for string.encode_lossy_utf8
so that this additional behaviour is permitted without too much trouble if this is a desirable implementation to support (i.e. a racing thread could see arbitrary interleavings of the old data, the unsanitised new data, and the sanitised new data).
if this is a desirable implementation to support
It's not a big issue either way, but the single-memcpy-plus-fixups implementation is a nice simplification compared to the alternative, so yeah, it would be nice (but not crucial) to support it. You can see the difference here (lines 1274 and following).
If, in the implementation of stringrefs in your wasm VM, you have a managed buffer of WTF-8, and the user requests that you write UTF-8 to memory via
string.encode_lossy_utf8
, one tactic would be to just memcpy the whole thing, and then go back and change any surrogate to be U+FFFD. (Not saying it's a good strategy, just a possible strategy.) In a single-threaded world, this is fine. Would it be fine with threads? See https://github.com/WebAssembly/threads/issues/189.