dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

[vm] register pressure `MemoryCopyInstr` on arm32 with element size 16 #51229

Open dcharkes opened 1 year ago

dcharkes commented 1 year ago
../../runtime/vm/compiler/backend/linearscan.cc: 2492: error: expected: unallocated->Start() < ToInstructionStart(register_use_pos)

While trying to add a unit test which uses the mem-copy instruction with element-size 16, the register allocator for arm32 ran out of registers. (It's trying to spill from the same location as the first use, a zero-length spill.)

We only have 16 registers in total, 8 of them are pinned in the Dart calling convention.

https://github.com/dart-lang/sdk/blob/bc31fe490308aca7a22b221af75d453b3f4ef29d/runtime/vm/constants_arm.h#L81-L97

With element 16, this instruction requires 9 registers (4 temps, and 5 parameters).

https://github.com/dart-lang/sdk/blob/bc31fe490308aca7a22b221af75d453b3f4ef29d/runtime/vm/compiler/backend/il_arm.cc#L158-L175

Since we're currently not exercising anything else than element-size 1, we'll not hit it in Dart code right now, but we should fix this.

Possible fix:

rmacnak-google commented 1 year ago

Could use the SIMD registers as the temps for the larger sizes. I see that gcc compiles a 16-byte copy with SIMD loads and stores:

void copy(uint64_t* src, uint64_t* dst) {
    uint64_t a = src[0];
    uint64_t b = src[1];
    dst[0] = a;
    dst[1] = b;
}
vld1.64 {d16-d17}, [r0:64]
vst1.64 {d16-d17}, [r1:64]
bx      lr