add explicit bounds checks and do not catch Invalid_argument anymore

WebAssembly / spec

WebAssembly specification, reference interpreter, and test suite.

https://webassembly.github.io/spec/

Other

3.09k stars 438 forks source link

add explicit bounds checks and do not catch Invalid_argument anymore #1659

Closed zapashcanon closed 1 year ago

zapashcanon commented 1 year ago

This allows to compile with -unsafe without getting a segfault or a random value.

There's something I don't understand in interpreter/runtime/table.ml and I left a comment there. I believe we should simply remove the Invalid_argument case.

It's a little bit unrelated but I also replaced all raise (Invalid_argument s); by invalid_arg s; (so that it's easier to grep for Invalid_arg to check if we're catching it).

Also unrelated, but it seems the Array32.make function in interpreter/util/lib.ml is wrong. IIUC it should check for Sys.max_array_length instead of Int64.of_int max_int (and Sys.max_floatarray_length if the array is made of floats, which is Sys.max_array length / 2).

zapashcanon commented 1 year ago

Can you also change the corresponding ops in the elem and data modules?

Is it really needed for data or elem ? I don't see how it is possible to call e.g. Data.load seg i with an invalid i at runtime as it'll always come from an index that has been validated. And indeed, currently we're never catching Invalid_argument for a Data.load (or I missed it). I believe the same is true for elements.

rossberg commented 1 year ago

Validation does not check that data/elem offsets are within bounds, because in general, these offsets may be dynamic (consider e.g. array.new_elem etc).

If you did not encounter that, then that may simply be a gap in the test suite, although I was under the impression that there are at least some tests with out-of-bound segment accesses.

zapashcanon commented 1 year ago

these offsets may be dynamic (consider e.g. array.new_elem etc)

This is true only in the gc proposal, right ? I'll do another PR there later to handle these cases if that's fine for you ? Here I did not encounter any of these, I did a grep on Invalid_argument so I don't see how I could have miss one.

rossberg commented 1 year ago

Not quite:

(module
  (table 2 funcref)
  (func $f (export "f") (table.init $e (i32.const 0) (i32.const 0) (i32.const 2)))
  (elem $e func $f)
)
(invoke "f")

This results in a trap even today, so I think it needs to change here.

zapashcanon commented 1 year ago

Indeed. Actually, explicit checks are already here in eval.ml (mem_oob, data_oob, table_oob and elem_oob).

Then we have always the same pattern in the eval loop:

| TableInit ... ->
  if table_oob ... then Trapping Table.Bounds ...
  else ...

I can either:

leave everything as in the current state of the PR (it allows to run the code with -unsafe, i.e. don't rely on Invalid_argument for bounds checks) ;
add explicit checks to Data.load and Elem.load but they're not going to bring anything more and will make the code slower ;
add explicit checks to Data.load and Elem.load but remove the checks in eval.ml, I'm afraid this is going to make the code a little bit ugly - e.g. in TableInit I'll have to test bounds anyway in case n = 0l otherwise it won't trap as expected...

What do you prefer?

rossberg commented 1 year ago

The extra checks in eval are needed for bulk ops, because they also need to trap on zero-length reads out of bounds.

I'd prefer to still add the extra checks to elem/data.ml anyway, so that they provide a self-contained interface, not dependent on external checks. Performance doesn't matter at all for the reference interpreter.

zapashcanon commented 1 year ago

Done. Let me know if you want me to change something else.

zapashcanon commented 1 year ago

Should be good now ?

EDIT: sorry I missed the "But please don't force-push to PR branches. ;)" and did it again... but, github is now showing the diff between successive force-pushes, is there a reason why it's not as good as commits to follow changes ?

rossberg commented 1 year ago

I can see individual diffs, but the comment history is still garbled in general, and reviewers don't get the "What changed since last review" button for a cumulative diff.

And of course, rewriting history is bad for your git karma. Merge is the git way. ;)