dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.1k stars 1.56k forks source link

dart2wasm `List<int>` is using fully boxed integer representation #52714

Open mraleph opened 1 year ago

mraleph commented 1 year ago

Both VM and dart2js benefit from tagging when operating with <int>[...], dart2wasm however boxes integers stored in the list - which makes it slow compared to native and JS targets.

This impacts a number of microbenchmarks like Fannkuch or Mandelbrot, which use List<int> on hot paths.

mraleph commented 1 year ago

/cc @osa1

osa1 commented 1 year ago

We discussed a few options to fix this.

Wasm GC does not support tagged integers so we can't have Wasm ref array and use it to store integers unboxed.

To be able to use different storage in List<int> and other List<T> types we need a subclass implementing List<int> (say _UnboxedIntList). We can then have a List factory that constructs _UnboxedIntList when T is int.

We also need to avoid calling List<T>.[]=, List<T>.add etc. as they have generic arguments and require boxing the ints.

Two ideas came up:

  1. Add special private methods to List for unboxed types, like List._addInt(int i), which concrete implementations override. This method will accept unboxed int argument, which then _UnboxedIntList can write directly to the backing storage.

    (+): Reliable: _addInt will never take boxed argument and will never check type of the argument, even if for whatever reason List<int> is not _UnboxedIntList. (-): Requires changing the standard library. (can we do this with just patch files to avoid changing the libs?)

  2. We only add _addInt to _UnboxedIntList (so it won't be overloaded), and implement add as void add(int i) => _addInt(i), then make sure _UnboxedIntList.add is inlined whenever possible. After inlining myIntList.add(i) calls will turn into myIntList._addInt(i), which takes unboxed argument and writes it directly to the array.

    (+): Does not require changing the standard library List. (-): Fragile, relies on TFA and inlining.

osa1 commented 1 year ago

I started implementing this. Option (1) above won't work as the Dart code will be using List.[]= which we want to turn into _IntList._setInt. So just adding List._setInt won't do anything without inlining the public interface methods.

In addition, we should be inlining the _IntList.[]= method, not the generic List.[]=, because for the generic one to call _IntList.[]= we need type tests which can be expensive in the worst case:

abstract class _ModifiableList<E> extends _ListBase<E> {
  void operator []=(int index, E value) {
    if (this is _ModifiableIntList && E == int) {
      this._setInt(index, unsafeCast(value));
    } else {
      _data.write(index, value);
    }
  }
}

So we'll have to go with (2):

Lastly, to be able to inline the methods we need to be able to see if a List<int> value is actually _UnboxedIntList. I don't know if we need any changes in the TFA for this.

alexmarkov commented 1 year ago
  • Somehow allocate _UnboxedIntList (and the immutable version) instead of the generic lists whenever possible.

    • Since E == int should be fast (just a class ID comparison), can we modify the factories with this check to redirect to the unboxed list classes?

Also consider replacing calls to List<int> factories with specialized versions of _UnboxedIntList factories early in the pipeline so TFA would be able to infer that List<int> factory always returns _UnboxedIntList. Example of such transformation: https://github.com/dart-lang/sdk/blob/main/pkg/vm/lib/transformations/specializer/list_factory_specializer.dart.

osa1 commented 1 year ago

I did some more surveying.

Current class hierarchy for lists:

- abstract class ListBase (abstract mixin class)              -- defined in dart.collection
| - abstract class _ListBase                                  -- defines length and data, used by code gen to read length and entries
| | - abstract class _ModifiableList
| | | - class _List                                           -- used by code gen to allocate lists
| | | - class _GrowableList
| | - class _ImmutableList

Uses of _ListBase._length and _ListBase._data:

I can see two ways of refactoring the class hierarchy, depending on how we want to use the types:

osa1 commented 1 year ago

I just realized while working on #52710 that we don't need any new classes for this. Int64List already implements List<int>, so with #52710 we can update the list factories to return _I64List when the type argument is int. Typed list accessors ([] and []=) already rely on inlining to avoid boxing the elements so _I64List (and _F64List for double) should just work.

osa1 commented 1 year ago

I'm implementing this in https://dart-review.googlesource.com/c/sdk/+/318680. So far I only tried the first class hierarchy above, and it resulted in more virtual calls (and as a result less inlining) and much worse performance in Fannkuch. With the patch it takes twice as long to run.

Since we rely on inlining to avoid boxing, with virtual calls we can't avoid passing ints boxed even if the backing array stores ints unboxed. We really need direct calls in TFA, and inlining in code generator.

The factory method used in this benchmark looks like this in the CL:

@patch
class List<E> {
  @patch
  factory List.filled(int length, E fill, {bool growable = false}) {
    if (E == int) {
      return unsafeCast<List<E>>(growable
          ? GrowableUnboxedIntList.filled(length, unsafeCast<int>(fill))
          : FixedLengthUnboxedIntList.filled(length, unsafeCast<int>(fill)));
    } else {
      return growable
          ? GrowableList<E>.filled(length, fill)
          : FixedLengthList<E>.filled(length, fill);
    }
  }
}

Somehow in the call sites for this TFA doesn't seem to figure out that if E is int then the returned class is a ModifiableUnboxedIntList, so we don't generate direct calls (and can't inline) ModifiableUnboxedIntList.operator []= and operator [].

The current class hierarchy is as shown in my comment above:

- abstract class ListBase (from standard library

| - abstract class _ListBase                      
| | - abstract class _ModifiableList
| | | - class _List                              
| | | - class _GrowableList
| | - class _ImmutableList

| - abstract class _UnboxedIntListBase           
| | - abstract class _ModifiableUnboxedIntListList // defines [] and []=, which we need to inline
| | | - class _UnboxedIntList                     
| | | - class _GrowableUnboxedIntList
| | - class _ImmutableUnboxedIntList

The list allocations in kernel look like this:

List<int> Function(int)
Block({
  final List<int> p = new List<int>.filled(n, 0);
  final List<int> q = new List<int>.filled(n, 0);
  final List<int> s = new List<int>.filled(n, 0);
  ...
})

@alexmarkov is there anything I can do to make p, q, s above have the type ModifiableUnboxedIntList (most specific superclass of GrowableUnboxedIntList and FixedLengthUnboxedIntList), so that the TFA will make method calls to ModifiableUnboxedIntList members direct calls and I can inline them in code generator?

alexmarkov commented 1 year ago

TFA infers that E is int while analyzing List<int>.filled invocation, but it is not able to evaluate E == int expression (as it is actually a call (E).operator==(int), and _Type.operator== has an implementation in dart2wasm which goes down to calling external methods.

I think the easiest way to infer type of List<int>.filled result would be to specialize calls to this factory constructor before running TFA - as suggested above in https://github.com/dart-lang/sdk/issues/52714#issuecomment-1598983731.

osa1 commented 1 year ago

Thanks -- transforming the factory calls before TFA worked great. Current results:

Before:     13,783,500 us.
After:       6,068,000 us.
JIT:         3,080,470 us.
dart2js -O4: 2,812,000 us.

With the unboxing the benchmark is 2x faster, but still takes twice as long compared to VM (JIT).

Note: dart2wasm results above are with --omit-type-checks, as with type checks we can't benchmark anything -- the type checks just overwhelm everything else.