WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.54k stars 745 forks source link

some obvious optimizations seems to be missing #7029

Open zapashcanon opened 1 month ago

zapashcanon commented 1 month ago

Hi,

I'm compiling the following OCaml code with Wasocaml:

type 'a list =
  | Nil
  | Cons of 'a * 'a list

let rec aux p = function
  | Nil -> Cons (1, Nil)
  | Cons (hd, tl) -> Cons (p + hd, aux hd tl)

let next = function Cons (1, tl) -> Cons (1, aux 1 tl) | _ -> assert false

let rec pascal n = if n = 0 then Cons (1, Nil) else next (pascal (n - 1))

let rec print = function
  | Nil -> print_string "\n"
  | Cons (hd, tl) ->
    print_int hd;
    print_string " ;";
    print tl

let () =
  for i = 0 to 400000 do
    print (pascal 32)
  done

This is producing a file that I'm merging with many others (runtime, stdlib etc.) through wasm-merge.

Then I'm removing all the exports with cat a.out.wat | grep -v "^ (export" > a.out.noexport.wat in order to be able ton run Binaryen in closed-world mode later.

The generated file (before being processed by Binaryen) is here: a.out.noexport.wat.zip

Then I'm running Binaryen this way:

$ wasm-opt --enable-gc --enable-reference-types --enable-multivalue --enable-tail-call --enable-nontrapping-float-to-int --traps-never-happen -O3 --abstract-type-refining --cfp --coalesce-locals --closed-world --type-ssa --gufa-optimizing --type-merging --strip-debug --strip-dwarf --strip-producers --strip-target-features --type-refining --unsubtyping --vacuum --tuple-optimization --ssa --simplify-locals --simplify-globals-optimizing --signature-refining --signature-pruning --roundtrip --reorder-locals --reorder-globals --reorder-functions --remove-unused-types --remove-unused-names --remove-unused-module-elements --remove-unused-brs --remove-memory --precompute --optimize-instructions --optimize-casts --once-reduction --monomorphize --minimize-rec-groups --merge-similar-functions --merge-locals --merge-blocks --local-subtyping --local-cse --licm --intrinsic-lowering --inlining-optimizing --heap-store-optimization --gto --gsi --global-refining --duplicate-import-elimination --duplicate-function-elimination --directize --dce --dae-optimizing -o a.out.wasm a.out.noexport.wat

It produces the following file a.out.wasm.zip.

When I run the same command with -S -o a.out.optimized.wat instead, I'm getting this file a.out.optimized.wat.zip.

It contains a few things that I find surprising. For instance, this pattern is present many times:

  (drop
   (local.get $5)
  )

And this one too:

 (func $valid_float_lexem_566 (type $Func_1) (param $0 (ref eq)) (param $1 (ref $Env)) (result i32 (ref eq))
  (unreachable)
 )

For the first one, I would expect the drop (local.get) to be simply removed. In the second case, I would expect the function to be inlined (and then its parameters dropped which would enable further optimization).

Am I doing something wrong ? I'm using a lot of Binaryen options so I wouldn't be surprised if I overlooked something.

Thanks!

kripken commented 1 month ago

That output does not look fully optimized, yeah. The problem here is that WasmGC input like this can require multiple passes of the optimizer, see

https://github.com/WebAssembly/binaryen/wiki/GC-Optimization-Guidebook#multiple-optimization-passes

Try adding -O3 -O3 -O3 -O3 -O3 at the end of that wasm-opt command (I verified the last of those finds nothing left to optimize). Another option is to add --converge if you don't want to pick a fixed number of cycles.

zapashcanon commented 1 month ago

Oh, I didn't know about this page. It's quite useful! I only read the man page and it was much less complete. I'll play with this now.

Thanks a lot!

zapashcanon commented 1 month ago

OK so, it looks like something is still missing.

I'm starting from this file a.out.noexport.wat.zip.

Then I run:

$ wasm-opt --enable-gc --enable-reference-types --enable-multivalue --enable-tail-call --enable-nontrapping-float-to-int --traps-never-happen -O3 --abstract-type-refining --cfp --coalesce-locals --closed-world --type-ssa --gufa-optimizing --type-merging --strip-debug --strip-dwarf --strip-producers --strip-target-features --type-refining --unsubtyping --vacuum --tuple-optimization --ssa --simplify-locals --simplify-globals-optimizing --signature-refining --signature-pruning --roundtrip --reorder-locals --reorder-globals --reorder-functions --remove-unused-types --remove-unused-names --remove-unused-module-elements --remove-unused-brs --remove-memory --precompute --optimize-instructions --optimize-casts --once-reduction --monomorphize --minimize-rec-groups --merge-similar-functions --merge-locals --merge-blocks --local-subtyping --local-cse --licm --intrinsic-lowering --inlining-optimizing --heap-store-optimization --gto --gsi --global-refining --duplicate-import-elimination --duplicate-function-elimination --directize --dce --dae-optimizing -O3 -O3 --gufa -O3 -O3 --gufa -O3 -O3 --converge --abstract-type-refining --cfp --coalesce-locals --closed-world --type-ssa --gufa-optimizing --type-merging --strip-debug --strip-dwarf --strip-producers --strip-target-features --type-refining --unsubtyping --vacuum --tuple-optimization --ssa --simplify-locals --simplify-globals-optimizing --signature-refining --signature-pruning --roundtrip --reorder-locals --reorder-globals --reorder-functions --remove-unused-types --remove-unused-names --remove-unused-module-elements --remove-unused-brs --remove-memory --precompute --optimize-instructions --optimize-casts --once-reduction --monomorphize --minimize-rec-groups --merge-similar-functions --merge-locals --merge-blocks --local-subtyping --local-cse --licm --intrinsic-lowering --inlining-optimizing --heap-store-optimization --gto --gsi --global-refining --duplicate-import-elimination --duplicate-function-elimination --directize --dce --dae-optimizing -O3 -O3 --gufa -O3 -O3 -O3 -O3 -O3 -O3 -O3 -S -o a.out.optimized.wat a.out.noexport.wat

And the resulting file is here: a.out.optimized.wat.zip.

I added a bunch of -O3, --gufa and also --converge and tried different combination but it doesn't seem to solve the problem.

kripken commented 1 month ago

What problem specifically do you see? I don't see any drops of local.gets in the output.

kripken commented 1 month ago

As for the second problem mentioned in the first comment, about $valid_float_lexem_566, you say you expect it to be inlined, but no direct calls exist to it (only ref.func).