Open dcharkes opened 1 year ago
Could use the SIMD registers as the temps for the larger sizes. I see that gcc compiles a 16-byte copy with SIMD loads and stores:
void copy(uint64_t* src, uint64_t* dst) {
uint64_t a = src[0];
uint64_t b = src[1];
dst[0] = a;
dst[1] = b;
}
vld1.64 {d16-d17}, [r0:64]
vst1.64 {d16-d17}, [r1:64]
bx lr
While trying to add a unit test which uses the mem-copy instruction with element-size 16, the register allocator for arm32 ran out of registers. (It's trying to spill from the same location as the first use, a zero-length spill.)
We only have 16 registers in total, 8 of them are pinned in the Dart calling convention.
https://github.com/dart-lang/sdk/blob/bc31fe490308aca7a22b221af75d453b3f4ef29d/runtime/vm/constants_arm.h#L81-L97
With element 16, this instruction requires 9 registers (4 temps, and 5 parameters).
https://github.com/dart-lang/sdk/blob/bc31fe490308aca7a22b221af75d453b3f4ef29d/runtime/vm/compiler/backend/il_arm.cc#L158-L175
Since we're currently not exercising anything else than element-size 1, we'll not hit it in Dart code right now, but we should fix this.
Possible fix: