Closed eukarpov closed 8 months ago
stack probing is added only to chkstk_test, other functions f1 and f2 do not require stack longer than 4k
f1:
0000000000000000: D10043FF sub sp,sp,#0x10
0000000000000004: F90007E0 str x0,[sp,#8]
0000000000000008: F94007E0 ldr x0,[sp,#8]
000000000000000C: 3900001F strb wzr,[x0]
0000000000000010: D503201F nop
0000000000000014: 910043FF add sp,sp,#0x10
0000000000000018: D65F03C0 ret
f2:
000000000000001C: D1400BEA sub x10,sp,#2,lsl #0xC
0000000000000020: F907F15F str xzr,[x10,#0xFE0]
0000000000000024: A9BE7BFD stp fp,lr,[sp,#-0x20]!
0000000000000028: 910003FD mov fp,sp
000000000000002C: F9000FE0 str x0,[sp,#0x18]
0000000000000030: F9400FE0 ldr x0,[sp,#0x18]
0000000000000034: 94000000 bl f1
0000000000000038: D503201F nop
000000000000003C: A8C27BFD ldp fp,lr,[sp],#0x20
0000000000000040: D65F03C0 ret
chkstk_test:
0000000000000044: D14007EA sub x10,sp,#1,lsl #0xC
0000000000000048: 9293FFEB mov x11,#-0xA000
000000000000004C: F2BFFF6B movk x11,#0xFFFB,lsl #0x10
0000000000000050: 8B2B63EB add x11,sp,x11
> LPSRL0:
> 0000000000000054: D140054A sub x10,x10,#1,lsl #0xC
> 0000000000000058: F900015F str xzr,[x10]
> 000000000000005C: EB0B015F cmp x10,x11
> 0000000000000060: 54FFFFA1 bne LPSRL0
0000000000000064: D140056B sub x11,x11,#1,lsl #0xC
0000000000000068: F906097F str xzr,[x11,#0xC10]
000000000000006C: D10FC3FF sub sp,sp,#0x3F0
0000000000000070: D14127FF sub sp,sp,#0x49,lsl #0xC
0000000000000074: A9007BFD stp fp,lr,[sp]
0000000000000078: 910003FD mov fp,sp
000000000000007C: 910043E0 add x0,sp,#0x10
0000000000000080: 94000000 bl f2
0000000000000084: D503201F nop
0000000000000088: A9407BFD ldp fp,lr,[sp]
000000000000008C: 910FC3FF add sp,sp,#0x3F0
0000000000000090: 914127FF add sp,sp,#0x49,lsl #0xC
0000000000000094: D65F03C0 ret
Based on the findings, stack in ararch64-mingw is limited by 8k. It is related to stack probing, it is what __chkstk is doing. Good news, It is implemented in gcc/config/aarch64/aarch64.cc and can be reused. The PR enables it, after that our test and OpenBLAS tests pass without any workaround
Stack probing tests pass https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build-2/actions/runs/7087372070/job/19289039464