Restricting offsets to match the index type

WebAssembly / memory64

Memory with 64-bit indexes

Other

214 stars 20 forks source link

Restricting offsets to match the index type #76

Closed bvisness closed 1 week ago

bvisness commented 1 month ago

In this proposal, offsets in memargs were updated to always be u64. For i32 memories, this means it is possible to express an offset that is larger than 4GiB, and this is not rejected in validation. This means that 32-bit runtimes (or runtimes in 32-bit mode) actually now have to store offsets differently and make sure that they handle 64-bit offsets in their bounds checks. This is very strange and wholly unnecessary, since an offset 4GiB or greater will always trap.

I think we should restrict the maximum offset in validation so that this is no longer a runtime concern. The syntax definition and binary format can probably continue to use u64, but in validation, we can enforce that the maximum offset for an i32 memory is 2^32 and the maximum offset for an i64 memory is 2^64.

Thoughts? I can make this change easily if others agree.

rossberg commented 3 weeks ago

I agree the offset needs to be restricted to the index type's range. FWIW, in the reference interpreter of this repo, validation appears to already implements that check. So we certainly need to make spec and interpreter consistent one way or the other.

Adding that validation check is the obvious fix. But there is an alternative, which is parameterising the AST of memargs — and therefore their binary format — over the respective index type. The observable difference is that this also disallows "overlong" encodings of u32 values, like we do elsewhere. Whereas with a validation time check, we would allow e.g. a 9-byte encoding of a 32-bit offset (unlike before).

The same logic applies to limits. If we go with the above, then these should also decode the limit values according to the index type flag.

bvisness commented 3 weeks ago

I think parameterizing memargs and limits feels more coherent to me, but for memargs, this would mean that parsing every load and store instruction would require us to parse the alignment, parse the memory index, look up the index type, and only then parse the offset. This data dependency during parsing seems somewhat unfortunate, although I don't know if it's any worse than anything else in the binary format.

If this is indeed a parsing performance problem, maybe we could keep the u64 in the binary format but still parameterize the memargs semantically? It would indeed be nice to force the appropriate range directly in memargs if we can.

rossberg commented 3 weeks ago

Ah yes, that's an excellent point – we don't want parsing to be type-dependent, since that breaks the phase separation. Now that I say that, I remember that we had that discussion before. :)

Okay, so let's just add the validation-time check to the spec.