Open ds84182 opened 1 year ago
can i work on this issue ?
/cc @alexmarkov
There are multiple optimizations missing here:
int
/double
boxing for record fields when they are passed or returned separately)._Record
) could be less informative. That should allow the compiler to replace call to _Double.+
.Currently AOT compiler can unbox record instances (of 2 fields) in return values, as returning multiple values would be probably the most common use of records. At this point we are not sure how records would be used in practice and what other scenarios would be performance critical. Do you have a real use case where performance of passing record instances as arguments would be critical?
(1) and (2) is not that hard to implement. However, we should try to find a motivating example / real use case before adding these optimizations, as they would increase complexity of the implementation.
(3) may severely affect AOT compilation time, so I'd like to avoid adding it unless absolutely necessary.
(4) looks like a bug and I'd like to fix it.
My primary use-case for records is to support vector value types via records + inline classes. It's much nicer to deal with compared to vector_math
since the underlying storage is immutable and const.
inline class Vec4 {
final (double, double, double, double) xyzw;
const Vec4.raw(this.xyzw);
const Vec4(double x, double y, double z, double w) : super((x, y, z, w));
double get x => xyzw.$1;
double get y => xyzw.$2;
double get z => xyzw.$3;
double get w => xyzw.$4;
Vec4 operator+(Vec4 rhs) {
var (lx, ly, lz, lw) = xyzw;
var (rx, ry, rz, rw) = rhs.xyzw;
return Vec4(lx + rx, ly + ry, lz + rz, lw + rw);
}
// ...
}
Another example is 128-bit ints via inline class:
inline class Int128 {
final (int, int) _lohi;
Int128.from(int value) : _lohi = (value, 0);
// ... implementations here
}
I'd still expect something as large as a 4x4 matrix to be heap allocated though. But these small records are the perfect candidate for unboxing.
There's some weird interactions with inlining.
These two programs are not equivalent:
class Box2<T> { final T x, y; const Box2(this.x, this.y); }
void main(List<String> args) {
final record = Box2(1.0, 1.0);
inline(record);
}
@pragma('vm:prefer-inline')
void inline(Box2<double> xy) {
var Box2(:x, :y) = xy;
print(x + y);
}
*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main (RegularFunction)
0: B0[graph]:0 {
v0 <- Constant(#null) T{Null?}
v26 <- UnboxedConstant(#2.0 double) T{_Double}
}
2: B1[function entry]:2
ParallelMove xmm0 <- C {
v2 <- Parameter(0) T{List<String>}
}
4: CheckStackOverflow:8(stack=0, loop=0)
6: MoveArgument(v26, SP+0)
8: StaticCall:48( print<0> v26)
9: ParallelMove rax <- C
10: Return:16(v0)
*** END CFG
void main(List<String> args) {
final record = (1.0, 1.0);
inline(record);
}
@pragma('vm:prefer-inline')
void inline((double, double) xy) {
var (x, y) = xy;
print(x + y);
}
*** BEGIN CFG
After AllocateRegisters
==== file://<source>_::_main (RegularFunction)
0: B0[graph]:0 {
v0 <- Constant(#null) T{Null?}
v3 <- Constant(#1.0) T{_Double}
}
2: B1[function entry]:2 {
v2 <- Parameter(0) T{List<String>}
}
4: CheckStackOverflow:8(stack=0, loop=0)
6: MoveArgument(v3, SP+1)
8: MoveArgument(v3, SP+0)
10: v14 <- StaticCall:42( +<0> v3, v3, using unchecked entrypoint, recognized_kind = Double_add, result_type = T{_Double}) T{_Double}
12: v16 <- Unbox(v14) T{_Double}
14: MoveArgument(v16, SP+0)
16: StaticCall:44( print<0> v16)
17: ParallelMove rax <- C
18: Return:14(v0)
*** END CFG
Without inlining, it unboxes x and y. But with inlining + load-store forwarding, it loses type information. Guessing it trusts the static type of the parameter when not inlined, but when inlined it only has AllocateSmallRecord to go off of?
Here is what was done so far:
BinaryDoubleOp
instead of a call to Double.+
.Thanks! I accidentally stumbled upon another place this comes up, looks like type propagation does not work for records that are allocated then immediately destructured: Godbolt
This causes the compiler to switch to doing table calls for Uint8List.[]
which increases code size by a lot. Maybe an earlier CSE pass could help here, maybe directly after inlining?
Godbolt link
When passing a record of primitives between functions, the values are boxed. Since these are value types without identity, they could exist as an unboxed record shape until they escape via covariance, similar to how int and double primitives work today (int -> num or double -> num require boxing).
Records also lose static type information, resulting in a runtime call to
_Double operator+
instead of unboxing.