WebAssembly / wabt

The WebAssembly Binary Toolkit
Apache License 2.0
6.74k stars 688 forks source link

wat2wasm segfaults on .wat file with many nested if statements #2377

Open womeier opened 7 months ago

womeier commented 7 months ago

Running wat2wasm on a rather big file with many nested if-statements segfaults.

Version: wabt 1.0.32-1 (debian bookworm) System: Debian (with enough RAM + swap)

Steps to reproduce:

wasm2wat CertiCoq.Benchmarks.tests.color.wasm > color.wat          // color.wat is ~20GB
wat2wasm color.wat -o color.wam                                    // Segmention fault
wasm-tools parse color.wat -o color.wasm                           // works

segfault:

(gdb) run
Starting program: /home/wolfgang/ProjectsUni/Compilerstep/wabt/out/clang/Debug/wat2wasm color.wat -o color.wasm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555555f3db3 in wabt::WastParser::GetToken (this=0x7fffffffd800)
    at /home/wolfgang/ProjectsUni/Compilerstep/wabt/src/wast-parser.cc:585
585       if (tokens_.empty()) {
(gdb)

CertiCoq.Benchmarks.tests.color.zip

keithw commented 7 months ago

It depends... on the size of the .wat, do you get a more concise output from binaryen's wasm-dis or wasm-tools print? If so... we can dive down to compare.

In terms of memory consumption of wat2wasm, wasm-tools parse is probably going to be better (and also better than binaryen's wasm-as) since I think it tries not to assemble an in-memory IR of the whole module.

womeier commented 7 months ago

Thanks for the prompt response, wasm-tools parse worked (with more swap, takes quite some time).

But wat2wasm still segfaults when run on the 20GB .wat file, I edited the issue accordingly.

FYI: both wasm-tools parse and wat2wasm require around 20GB of RAM.

keithw commented 7 months ago

Cool, what's the segfault? (E.g. if you make a debug build and run it in gdb, where does it segfault?)

sbc100 commented 7 months ago

What is the reason for wanting to support such large files? Done't Wasm runtimes have internal limits that make running such files impossible? Perhaps we should instead just impose a limit on the size of any file we process? (see https://webassembly.github.io/spec/core/appendix/implementation.html#binary-format).

womeier commented 7 months ago

Here is the segfault, hope it helps:

(gdb) run
Starting program: /home/wolfgang/ProjectsUni/Compilerstep/wabt/out/clang/Debug/wat2wasm color.wat -o color.wasm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555555f3db3 in wabt::WastParser::GetToken (this=0x7fffffffd800)
    at /home/wolfgang/ProjectsUni/Compilerstep/wabt/src/wast-parser.cc:585
585       if (tokens_.empty()) {
(gdb)

@sbc100 We extended the CertiCoq compiler with a Wasm backend. One of the benchmarks of CertiCoq runs a graph coloring algorithm, the generated binary includes the representation of a graph in a functional language, compiled to Wasm. Didn't have issues running the other (a little smaller) benchmarks, haven't tried this one yet.

And in case you're wandering, the roundtrip with wasm2wat and wat2wasm is not strictly necessary, we'll get rid of that soon.

sbc100 commented 7 months ago

What wasm engine are are you able to run 20GB program on (or even 1GB program for that matter)? I would have assumed that such a program would be exceed the internal limits of any engine.

womeier commented 7 months ago

It's just the color.wat file that's so big (for some reason), the color.wasm file is just ~2MB, I ran the color.wasm file in both Node.js and wasmtime, both went fine. (Node.js takes > 1 min to instantiate the module, but no crash :) )

womeier commented 5 months ago

I think I understand now, why the color.wat file was so big, it has a quite large chain of nested if-statements, like so:

...
some_check
if
  return
else
   ...
   some_check   
   if
     return
   else
      ... ~20,000x this nesting
   end
end

Presumably, this adds quite a lot of whitespace. (gzip reduces the 16GB wat file to ~25MB)

we now generate the semantically equivalent programs, where the else branch is moved to after the if-statement, like so:

some_check
if
  return
end
...
some_check
if
  return
end
...

This change reduced the color.wasm from 1.8 MB to 1.7 MB, but the color.wat from 16GB to 14MB

Also, we generate code in SSA form, so one function has ~20k locals before we run wasm-opt --coalesce-locals :)

rossberg commented 5 months ago
some_check
if
  return
end

You should be able to shorten that if return end even more to just br_if i, where i is the label of the function body – a br out of a function body is equivalent to return.