Open rajsite opened 1 year ago
Hi, great questions!
canonopt
which would, to your first point, reliably be in the minimal form. And for lifting, invalid encodings are precisely defined to produce a fatal wasm traps that halt execution (so there's no question of what the lowering side receives in case of invalid input).spec/test
directory. Work is started in #192. There's also a bunch of unit tests in wasmtime that we'll want to merge into this repo.So given all that, core wasm running in the component shouldn't need to validate incoming strings.
Forgive me if I'm missing it, but is there a discussion of how the Unicode UTR 36: UTF-8 Exploits are addressed by the component model strings?
From what I can tell looking at the CanonicalABI it looks like the string lift operation is responsible for validation and trapping on "Unicode Errors".
I'm wondering what guarantees I have as a component author that lowered strings are valid UTF-8 strings from the security perspective of that report. For example, overlong string encodings in the cited UTR 36 document and in the UTF-8 Wikipedia: Invalid Sequences and Error Handling topic are specifically described as being the cause of security issues in web services (a relevant use-case for WASM components) and potentially overlooked by decoders (WASM component authors).
Some concrete questions:
The canon_lower topic has a discussion point on efficient trampolines:
Is there a discussion of the validation expectations of such efficient trampoline optimizations? I'd assume you would still need to run the validation passes associated with a lift on a UTF-8 string to prevent issues like overlong encoding being overlooked.
My goal in the end is to make sure I'm not doing that work twice. If there are strong guarantees clearly described about what validation is done on strings I can skip doing that work or conversely make sure it is done.