WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.27k stars 715 forks source link

Use ChildTyper in the validator #6613

Open tlively opened 1 month ago

tlively commented 1 month ago

The validator uses a large amount of code to check that child expressions have the correct types. Writing this code is a very manual process, and bugs are hard to catch because they manifest as the validator being overly permissive. We can delete a large amount of validator code and make it easier to get full validator coverage by using the new ChildTyper utility from the validator. We should also add more comprehensive validation tests, ideally as upstream spec tests (see #6612)

tlively commented 2 weeks ago

Improving the validator this way would let us re-enable the unreached-invalid.wast spec test. @kripken, were you interested in taking a look at this?

kripken commented 2 weeks ago

I can look into it, sure.

kripken commented 2 weeks ago

A problem that happens here is that ChildTyper seems to assume valid IR already. For example:

https://github.com/WebAssembly/binaryen/blob/1079a9e34599e65ee25fb5f32caa57bd21737593/src/ir/child-typer.h#L781-L784

getHeapType() will assert if the type is unreachable, and getSignature() will assert if it is a non-signature reference type.

I'm not sure offhand what kind of API change would make the most sense here, but doing that as a first step is necessary I think.

tlively commented 2 weeks ago

Since ChildTyper only checks types based on information that would be included in the binary, my expectation is that other properties, including whether we can even produce such information, would be checked by the validator before it calls into ChildTyper. Basically wherever we have a type annotation in the binary format, the validator will have to check that the type is valid before checking the child types more generally.

kripken commented 2 weeks ago

It seems hard to split things out that way, I'm afraid. CallRef would need to see that the reference child is unreachable and therefore ignore it in ChildTyper. That is, it is valid for that child to be unreachable, so we can't just put a check before calling ChildTyper here. ChildTyper itself would need to do the check or at least I can't see a better way.

tlively commented 2 weeks ago

I would expect that when validating a CallRef, the validator would only call into ChildTyper if the ref is a typed function reference. Why is it hard to split out that way?

kripken commented 2 weeks ago

I guess nothing after it needs to be validated if the ref is not a signature, but I'm not sure that's the case elsewhere - in general we may have one child be unreachable and need to ignore it, but not another.

Also, that would mean that the decision to call the ChildTyper must be done in visitCallRef or such, that is, it is specific to each expression class. That means we can't do it in a more generic location which would have been a lot simpler/shorter.

kripken commented 2 weeks ago

Ah, as an example for the first paragraph in the last comment, see StringEncode:

https://github.com/WebAssembly/binaryen/blob/1079a9e34599e65ee25fb5f32caa57bd21737593/src/ir/child-typer.h#L987-L995

If the array is unreachable we still need to reach the last line that validates the i32 type of another child.

tlively commented 2 weeks ago

Also, that would mean that the decision to call the ChildTyper must be done in visitCallRef or such, that is, it is specific to each expression class. That means we can't do it in a more generic location which would have been a lot simpler/shorter.

Yes, it would be an annoyingly large PR. An alternative might be to have ChildTyper detect when there's not enough information to type some children and fall back to some more general constraint only for those particular children. So in the example above, we could leave array unconstrained by still ensure that str and start have the correct types.