WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
996 stars 71 forks source link

Real-world Java type graph #212

Closed tlively closed 2 years ago

tlively commented 3 years ago

real_world_types.wasm.gz

This is a binary module containing a type section and rtt globals created from the J2CL text output for a significant real-world Java program. These types have not been canonicalized.

I verified that Binaryen and V8 can read the module, but I could not get the spec interpreter to read it due to a stack overflow on the master branch and an error about an invalid type on the canon branch.

rossberg commented 3 years ago

Ah, the spec interpreter was decoding the typeidx on an rtt as s33 instead of u32. After fixing that I measure

Decoding, Time: user 715.46 ms, system 53.95 ms; GC: major 10, minor 296, compactions 0
Validation, Time: user 1864.12 ms, system 42.84 ms; GC: major 7, minor 763, compactions 0
Canonicalization, Time: user 1722.53 ms, system 62.86 ms; GC: major 2, minor 553, compactions 0
  Graph construction, Time: user 168.70 ms, system 0.29 ms; GC: major 0, minor 89, compactions 0
  Minimization, Time: user 1268.72 ms, system 50.68 ms; GC: major 2, minor 362, compactions 0
  SCC computation, Time: user 11.08 ms, system 0.02 ms; GC: major 0, minor 3, compactions 0
  Repo lookup, Time: user 248.85 ms, system 11.82 ms; GC: major 0, minor 97, compactions 0

with whole-module canonicalization. This still is reasonably fast, faster than validation of these sections. I assume that if this was a real module with actual code, then this time would still be vastly dominated by other things.

However, with incremental it gets much worse:

Decoding, Time: user 716.59 ms, system 50.95 ms; GC: major 10, minor 296, compactions 0
Validation, Time: user 1867.34 ms, system 43.36 ms; GC: major 7, minor 763, compactions 0
Canonicalization, Time: user 18341.69 ms, system 92.23 ms; GC: major 30, minor 103425, compactions 0

So the incremental approach as is would probably not be good enough, and at least some hybrid approach would be necessary. (Though the 100 K garbage collections during this run look a bit odd, so there might be some performance bug as well.)

What would be needed here is to put this all into relation to the validation & compilation times for a complete version of this module that includes the actual code. Do you happen to have numbers for that, e.g., for V8? Or can you share the complete module, so that I can at least measure validation times in the interpreter?

tlively commented 3 years ago

Thanks for those results, @rossberg!

What would be needed here is to put this all into relation to the validation & compilation times for a complete version of this module that includes the actual code. Do you happen to have numbers for that, e.g., for V8? Or can you share the complete module, so that I can at least measure validation times in the interpreter?

Unfortunately I can't share the full module right now, but I could measure V8's baseline compilation time on the full module. V8 does not do type canonicalization yet, but I agree that getting a sense of the relative durations would be useful.

FWIW, Binaryen currently takes 40 minutes to canonicalize this type graph. I will work on improving its algorithm today and report back on the improved results.

tlively commented 3 years ago

I ran V8 on the full binary (with canonicalized types but unoptimized code) to measure baseline validation and compilation performance. The binary is 52 MB.

// harness.js
let buffer = readbuffer('wasm_engine_dev_05_10.wasm');
console.log("Validating!");

let beginValidation = performance.now();
WebAssembly.validate(buffer);
let validationTime = performance.now() - beginValidation;

console.log("Compiling!\n");
let beginCompile = performance.now();
let module = new WebAssembly.Module(buffer);
let compilationTime = performance.now() - beginCompile;

console.log("Validation time: ", validationTime, "ms");
console.log("Compilation time: ", compilationTime, "ms");
$ v8 --experimental-wasm-gc --experimental-wasm-eh --liftoff-only harness.js
Validating!
Compiling!

Validation time:  835.407 ms
Compilation time:  1259.8000000000002 ms

Edit: Note that as far as I can tell, the 1260ms baseline compilation time above also includes validating the module again. If I remove the initial validation, the compilation time does not change.

rossberg commented 3 years ago

Yes, with the JS API, compile should do validation (though the engine could easily memoize the result for a module object).

Can you measure the validation time in the reference interpreter? With a build from the canon branch, do

./wasm -c wasm_engine_dev_05_10.wasm -d

(hoping that this doesn't overflow anything).

And I would be curious about the compile times for a non-baseline compile as well.

tlively commented 3 years ago

Running v8 --experimental-wasm-gc --experimental-wasm-eh --no-liftoff harness.js gives a compilation time of just under 7 minutes.

Trying to run the canon interpreter on the full module gives an error about (presumably) the event section being unrecognized. Would you be able to rebase the canon interpreter on top of the EH interpreter or otherwise work around that?

rossberg commented 3 years ago

Ah, okay. Do you know if it is only using the event section, or also EH instructions?

jakobkummerow commented 3 years ago

compilation time of just under 7 minutes

Quick side note: optimized compilation is unexpectedly slow there; we are looking into it.

tlively commented 3 years ago

Ah, okay. Do you know if it is only using the event section, or also EH instructions?

Actually, it looks like it does not. I can produce and use a version that doesn't have the event section.

tlively commented 3 years ago

Oops, it turns out it contains throw, but no other EH instructions. I hacked the reference interpreter to decode the throw as a drop (since the only event type has a single parameter), but ended up getting this stack overflow:

Decoding...
./wasm: uncaught exception Stack overflow
Raised by primitive operation at Source.(@@) in file "util/source.ml" (inlined), line 5, characters 20-41
Called from Decode.module_.(fun) in file "binary/decode.ml", line 937, characters 33-62
Called from Stdlib__list.map2 in file "list.ml", line 131, characters 48-60

I increased the stack size and tried again, but got a validation failure after multiple minutes of validation:

$ (ulimit -s 65536; time ./wasm -c ~/code/binaryen/wasm_engine_dev_05_10.no-event.wasm -d)
Decoding...
Time: user 15508.20 ms, system 1460.12 ms; GC: major 18, minor 5350, compactions 0
Validating...
/usr/local/google/home/tlively/code/binaryen/wasm_engine_dev_05_10.no-event.wasm:0x248d4f: invalid module: type mismatch: instruction requires [(ref null 8) (ref null 10) i32 (ref null 0) i32 i32 (ref (rtt 5))] but stack has [(ref null 8) (ref null 10) i32 (ref null 0) i32 i32 (ref null (rtt 2 5))]

real    3m49.271s
user    3m46.461s
sys     0m2.780s

That (ref (rtt 5)) looks fishy; the prototypes have rtts as value types, not heap types.

rossberg commented 3 years ago

Thanks for looking into this!

/usr/local/google/home/tlively/code/binaryen/wasm_engine_dev_05_10.no-event.wasm:0x248d4f: invalid module: type mismatch: instruction requires [(ref null 8) (ref null 10) i32 (ref null 0) i32 i32 (ref (rtt 5))] but stack has [(ref null 8) (ref null 10) i32 (ref null 0) i32 i32 (ref null (rtt 2 5))]

That (ref (rtt 5)) looks fishy; the prototypes have rtts as value types, not heap types.

The ref itself actually is fine: the MVP specifies (rtt x) as short for (ref (rtt x)), similar to how i31[ref] is (ref i31) (both in text and binary format). The interpreter just normalises this in its output. But the bug was that the decoder was erroneously decoding it as (ref null (rtt x)), thus the nullability mismatch. Should be fixed by #214.

tlively commented 3 years ago

I also inquired about the corresponding JS size when this code base is compiled to JS instead of Wasm. The unoptimized JS is about 59 MB due to its pervasive use of long, fully-qualified names. After being optimized by Closure, the optimized JS is about 5 MB. This comparison is not quite fair or exact because the JS compilation target includes more source code than the Wasm compilation target, estimated to increase code size by 25%.

We don't have the optimized code size for the Wasm module yet, but I'll post another update when we do.

tlively commented 3 years ago

I just finished implementing Valmari and Lehtinen's DFA minimization algorithm in Binaryen and it brought the graph minimization time for real_world_types.wasm down to 1545ms, which is the same ballpark as @rossberg's implementation.

tlively commented 2 years ago

Closing this because we have finished benchmarking type systems and have chosen one for the MVP.