dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

[dart2wasm] Optimize `math.min()` and `math.max()` #55173

Open mkustermann opened 8 months ago

mkustermann commented 8 months ago

Right now the min() and max() are never inlined, they are very slow. Even when force inlining we generate very bad code. It shows up on material3 demo profile (e.g. used in _computeSizes() method).

For example this simple program that only uses constants:

import 'dart:math';

void main() {
  print(max(1, 2));
  print(max(2, 1));
  print(max(1.1, 2.1));
  print(max(2.1, 1.1));
}

turns into this monster when using -O4 and adding @pragma('wasm:prefer-inline') to the max function:

 (func $main tear-off trampoline (;67;) (param $var0 (ref struct)) (param $var1 (ref null $#Top)) (result (ref null $#Top))
   block $label0 (result (ref $_BoxedInt))
     global.get $global36
     global.get $global36
     global.get $global35
     call $_BoxedInt.>
     br_if $label0
     drop
     global.get $global35
     global.get $global36
     global.get $global35
     call $_BoxedInt.<
     br_if $label0
     drop
     global.get $global4
     global.get $global35
     call $_InterfaceType._checkInstance
     drop
     block $label1
       global.get $global35
       global.get $global24
       call $_BoxedInt.==
       i32.eqz
       br_if $label1
       global.get $global36
       call $_BoxedInt.isNegative
       i32.eqz
       br_if $label1
       global.get $global35
       br $label0
     end $label1
     global.get $global36
   end $label0
   call $print
   block $label2 (result (ref $_BoxedInt))
     global.get $global35
     global.get $global35
     global.get $global36
     call $_BoxedInt.>
     br_if $label2
     drop
     global.get $global36
     global.get $global35
     global.get $global36
     call $_BoxedInt.<
     br_if $label2
     drop
     global.get $global4
     global.get $global36
     call $_InterfaceType._checkInstance
     drop
     block $label3
       global.get $global36
       global.get $global24
       call $_BoxedInt.==
       i32.eqz
       br_if $label3
       global.get $global35
       call $_BoxedInt.isNegative
       i32.eqz
       br_if $label3
       global.get $global36
       br $label2
     end $label3
     global.get $global35
   end $label2
   call $print
   block $label4 (result (ref $_BoxedDouble))
     global.get $global30
     global.get $global30
     global.get $global29
     call $_BoxedDouble.>
     br_if $label4
     drop
     global.get $global29
     global.get $global30
     global.get $global29
     call $_BoxedDouble.<
     br_if $label4
     drop
     global.get $global4
     global.get $global29
     call $_InterfaceType._checkInstance
     if
       global.get $global102
       call $<obj> is Class(double)
       i32.eqz
       if
         global.get $global102
         call $<obj> is Class(int)
         drop
       end
       global.get $global30
       br $label4
     end
     block $label5
       global.get $global29
       global.get $global24
       call $_BoxedDouble.==
       i32.eqz
       br_if $label5
       global.get $global30
       call $_BoxedDouble.isNegative
       i32.eqz
       br_if $label5
       global.get $global29
       br $label4
     end $label5
     global.get $global30
   end $label4
   call $print
   block $label6 (result (ref $_BoxedDouble))
     global.get $global29
     global.get $global29
     global.get $global30
     call $_BoxedDouble.>
     br_if $label6
     drop
     global.get $global30
     global.get $global29
     global.get $global30
     call $_BoxedDouble.<
     br_if $label6
     drop
     global.get $global4
     global.get $global30
     call $_InterfaceType._checkInstance
     if
       global.get $global102
       call $<obj> is Class(double)
       i32.eqz
       if
         global.get $global102
         call $<obj> is Class(int)
         drop
       end
       global.get $global29
       br $label6
     end
     block $label7
       global.get $global30
       global.get $global24
       call $_BoxedDouble.==
       i32.eqz
       br_if $label7
       global.get $global29
       call $_BoxedDouble.isNegative
       i32.eqz
       br_if $label7
       global.get $global30
       br $label6
     end $label7
     global.get $global29
   end $label6
   call $print
   ref.null none
 )

Almost all call sites will know that values are integers or doubles, so we really should take advantage of that. We should also try to do int64 comparisons and double comparisons inline as much as possible instead of method calls.

osa1 commented 8 months ago

Existing kernel level transformations don't help us here, perhaps we could generalize the partial instantiator we've added in fd954d426fa, maybe with a pragma like:

@pragma('wasm:instantiate')
T max<T extends num>(T a, T b) { ... }

which will make dart2wasm generate different copies of the function based on the type argument, and when in the call sites when the type argument is known call the right instantiation. @mkustermann wdyt?

mkustermann commented 8 months ago

which will make dart2wasm generate different copies of the function based on the type argument, and when in the call sites when the type argument is known call the right instantiation. @mkustermann wdyt?

Yes, I think we should have versions specialized to the static types of the arguments on the call site (e.g. a max<int>, max<double>, max<num> that have their signature types strengthened to take (int, int) -> int, (double, double) -> double, (num, num) -> num) .

We still let binaryen decide whether (from code size perspective) it makes sense to inline those copies of max / min / ...