Open mraleph opened 1 year ago
/cc @osa1
We discussed a few options to fix this.
Wasm GC does not support tagged integers so we can't have Wasm ref
array and use it to store integers unboxed.
To be able to use different storage in List<int>
and other List<T>
types we need a subclass implementing List<int>
(say _UnboxedIntList
). We can then have a List
factory that constructs _UnboxedIntList
when T
is int
.
We also need to avoid calling List<T>.[]=
, List<T>.add
etc. as they have generic arguments and require boxing the int
s.
Two ideas came up:
Add special private methods to List
for unboxed types, like List._addInt(int i)
, which concrete implementations override. This method will accept unboxed int
argument, which then _UnboxedIntList
can write directly to the backing storage.
(+): Reliable: _addInt
will never take boxed argument and will never check type of the argument, even if for whatever reason List<int>
is not _UnboxedIntList
.
(-): Requires changing the standard library. (can we do this with just patch files to avoid changing the libs?)
We only add _addInt
to _UnboxedIntList
(so it won't be overloaded), and implement add
as void add(int i) => _addInt(i)
, then make sure _UnboxedIntList.add
is inlined whenever possible. After inlining myIntList.add(i)
calls will turn into myIntList._addInt(i)
, which takes unboxed argument and writes it directly to the array.
(+): Does not require changing the standard library List
.
(-): Fragile, relies on TFA and inlining.
I started implementing this. Option (1) above won't work as the Dart code will be using List.[]=
which we want to turn into _IntList._setInt
. So just adding List._setInt
won't do anything without inlining the public interface methods.
In addition, we should be inlining the _IntList.[]=
method, not the generic List.[]=
, because for the generic one to call _IntList.[]=
we need type tests which can be expensive in the worst case:
abstract class _ModifiableList<E> extends _ListBase<E> {
void operator []=(int index, E value) {
if (this is _ModifiableIntList && E == int) {
this._setInt(index, unsafeCast(value));
} else {
_data.write(index, value);
}
}
}
So we'll have to go with (2):
_UnboxedIntList
and _ImmutableUnboxedIntList
.@pragma('wasm:prefer-inline')
to add
, []=
etc. methods of these classes to call _addInt(int a)
, _setInt(int index, int value)
etc. directly in the call sites._UnboxedIntList
(and the immutable version) instead of the generic lists whenever possible.
E == int
should be fast (just a class ID comparison), can we modify the factories with this check to redirect to the unboxed list classes?Lastly, to be able to inline the methods we need to be able to see if a List<int>
value is actually _UnboxedIntList
. I don't know if we need any changes in the TFA for this.
Somehow allocate
_UnboxedIntList
(and the immutable version) instead of the generic lists whenever possible.
- Since
E == int
should be fast (just a class ID comparison), can we modify the factories with this check to redirect to the unboxed list classes?
Also consider replacing calls to List<int>
factories with specialized versions of _UnboxedIntList
factories early in the pipeline so TFA would be able to infer that List<int>
factory always returns _UnboxedIntList
. Example of such transformation: https://github.com/dart-lang/sdk/blob/main/pkg/vm/lib/transformations/specializer/list_factory_specializer.dart.
I did some more surveying.
Current class hierarchy for lists:
- abstract class ListBase (abstract mixin class) -- defined in dart.collection
| - abstract class _ListBase -- defines length and data, used by code gen to read length and entries
| | - abstract class _ModifiableList
| | | - class _List -- used by code gen to allocate lists
| | | - class _GrowableList
| | - class _ImmutableList
Uses of _ListBase._length
and _ListBase._data
:
List.[]
when the receiver is a list constant (as intrinsic)makeListFixedLength
and makeFixedListUnmodifiable
intrinsicsFunction.apply
to pack arguments (intrinsic)Translator.makeList
, Translator.indexList
, Translator.getListLength
I can see two ways of refactoring the class hierarchy, depending on how we want to use the types:
If we want to be able to read the data directly (i.e. access _length
and _data
) in the generated code, we will need two base classes for unboxed int lists and boxed lists:
- abstract class ListBase (abstract mixin class) -- defined in dart.collection
| - abstract class _ListBase -- defines length and data, used by code gen to read length and entries
| | - abstract class _ModifiableList
| | | - class _List -- used by code gen to allocate lists
| | | - class _GrowableList
| | - class _ImmutableList
| - abstract class _UnboxedIntListBase -- defines length and data
| | - abstract class _ModifiableUnboxedIntListList
| | | - class _UnboxedIntList
| | | - class _GrowableUnboxedIntList
| | - class _ImmutableUnboxedIntList
This adds a lot of classes, and difficult to scale with more unboxed list types (e.g. if we want to add _UnboxedDoubleList
).
It should be OK to access the elements via a method (instead of directly reading the array in generated code) as we only do this for dynamic invocations (which are already extremely slow). So it should be OK to remove the _data
field from the base class and do something like:
- abstract class ListBase (abstract mixin class) -- defined in dart.collection
| - abstract class _ListBase -- defines length field
| | - abstract class _ModifiableList
| | | - class _List
| | | - class _GrowableList
| | | - class _UnboxedIntList -- new
| | | - class _GrowableUnboxedIntList -- new
| | - class _ImmutableList
| | - class _ImmutableUnboxedIntList -- new
This also requires fair amount of copy-paste-modify in add
, insert
etc. methods, but adds less number of classes.
I just realized while working on #52710 that we don't need any new classes for this. Int64List
already implements List<int>
, so with #52710 we can update the list factories to return _I64List
when the type argument is int
. Typed list accessors ([]
and []=
) already rely on inlining to avoid boxing the elements so _I64List
(and _F64List
for double
) should just work.
I'm implementing this in https://dart-review.googlesource.com/c/sdk/+/318680. So far I only tried the first class hierarchy above, and it resulted in more virtual calls (and as a result less inlining) and much worse performance in Fannkuch. With the patch it takes twice as long to run.
Since we rely on inlining to avoid boxing, with virtual calls we can't avoid passing int
s boxed even if the backing array stores int
s unboxed. We really need direct calls in TFA, and inlining in code generator.
The factory method used in this benchmark looks like this in the CL:
@patch
class List<E> {
@patch
factory List.filled(int length, E fill, {bool growable = false}) {
if (E == int) {
return unsafeCast<List<E>>(growable
? GrowableUnboxedIntList.filled(length, unsafeCast<int>(fill))
: FixedLengthUnboxedIntList.filled(length, unsafeCast<int>(fill)));
} else {
return growable
? GrowableList<E>.filled(length, fill)
: FixedLengthList<E>.filled(length, fill);
}
}
}
Somehow in the call sites for this TFA doesn't seem to figure out that if E
is int
then the returned class is a ModifiableUnboxedIntList
, so we don't generate direct calls (and can't inline) ModifiableUnboxedIntList.operator []=
and operator []
.
The current class hierarchy is as shown in my comment above:
- abstract class ListBase (from standard library
| - abstract class _ListBase
| | - abstract class _ModifiableList
| | | - class _List
| | | - class _GrowableList
| | - class _ImmutableList
| - abstract class _UnboxedIntListBase
| | - abstract class _ModifiableUnboxedIntListList // defines [] and []=, which we need to inline
| | | - class _UnboxedIntList
| | | - class _GrowableUnboxedIntList
| | - class _ImmutableUnboxedIntList
The list allocations in kernel look like this:
List<int> Function(int)
Block({
final List<int> p = new List<int>.filled(n, 0);
final List<int> q = new List<int>.filled(n, 0);
final List<int> s = new List<int>.filled(n, 0);
...
})
@alexmarkov is there anything I can do to make p
, q
, s
above have the type ModifiableUnboxedIntList
(most specific superclass of GrowableUnboxedIntList
and FixedLengthUnboxedIntList
), so that the TFA will make method calls to ModifiableUnboxedIntList
members direct calls and I can inline them in code generator?
TFA infers that E
is int
while analyzing List<int>.filled
invocation, but it is not able to evaluate E == int
expression (as it is actually a call (E).operator==(int)
, and _Type.operator==
has an implementation in dart2wasm which goes down to calling external methods.
I think the easiest way to infer type of List<int>.filled
result would be to specialize calls to this factory constructor before running TFA - as suggested above in https://github.com/dart-lang/sdk/issues/52714#issuecomment-1598983731.
Thanks -- transforming the factory calls before TFA worked great. Current results:
Before: 13,783,500 us.
After: 6,068,000 us.
JIT: 3,080,470 us.
dart2js -O4: 2,812,000 us.
With the unboxing the benchmark is 2x faster, but still takes twice as long compared to VM (JIT).
Note: dart2wasm results above are with --omit-type-checks
, as with type checks we can't benchmark anything -- the type checks just overwhelm everything else.
Both VM and dart2js benefit from tagging when operating with
<int>[...]
, dart2wasm however boxes integers stored in the list - which makes it slow compared to native and JS targets.This impacts a number of microbenchmarks like Fannkuch or Mandelbrot, which use
List<int>
on hot paths.