Windows-on-ARM-Experiments / gcc-woarm64

Fork of gcc containing fixes for Windows on ARM64.
GNU General Public License v2.0
8 stars 1 forks source link

Enable stack probing on Aarch64 mingw to prevent crash on stack growth #11

Closed eukarpov closed 8 months ago

eukarpov commented 8 months ago

Based on the findings, stack in ararch64-mingw is limited by 8k. It is related to stack probing, it is what __chkstk is doing. Good news, It is implemented in gcc/config/aarch64/aarch64.cc and can be reused. The PR enables it, after that our test and OpenBLAS tests pass without any workaround

Stack probing tests pass https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build-2/actions/runs/7087372070/job/19289039464

[ RUN      ] Aarch64MinGW.CHKSTKTest
[       OK ] Aarch64MinGW.CHKSTKTest (0 ms)
eukarpov commented 8 months ago

stack probing is added only to chkstk_test, other functions f1 and f2 do not require stack longer than 4k

f1:
  0000000000000000: D10043FF  sub         sp,sp,#0x10
  0000000000000004: F90007E0  str         x0,[sp,#8]
  0000000000000008: F94007E0  ldr         x0,[sp,#8]
  000000000000000C: 3900001F  strb        wzr,[x0]
  0000000000000010: D503201F  nop
  0000000000000014: 910043FF  add         sp,sp,#0x10
  0000000000000018: D65F03C0  ret
f2:
  000000000000001C: D1400BEA  sub         x10,sp,#2,lsl #0xC
  0000000000000020: F907F15F  str         xzr,[x10,#0xFE0]
  0000000000000024: A9BE7BFD  stp         fp,lr,[sp,#-0x20]!
  0000000000000028: 910003FD  mov         fp,sp
  000000000000002C: F9000FE0  str         x0,[sp,#0x18]
  0000000000000030: F9400FE0  ldr         x0,[sp,#0x18]
  0000000000000034: 94000000  bl          f1
  0000000000000038: D503201F  nop
  000000000000003C: A8C27BFD  ldp         fp,lr,[sp],#0x20
  0000000000000040: D65F03C0  ret
chkstk_test:
  0000000000000044: D14007EA  sub         x10,sp,#1,lsl #0xC
  0000000000000048: 9293FFEB  mov         x11,#-0xA000
  000000000000004C: F2BFFF6B  movk        x11,#0xFFFB,lsl #0x10
  0000000000000050: 8B2B63EB  add         x11,sp,x11

> LPSRL0:
>   0000000000000054: D140054A  sub         x10,x10,#1,lsl #0xC
>   0000000000000058: F900015F  str         xzr,[x10]
>   000000000000005C: EB0B015F  cmp         x10,x11
>   0000000000000060: 54FFFFA1  bne         LPSRL0

  0000000000000064: D140056B  sub         x11,x11,#1,lsl #0xC
  0000000000000068: F906097F  str         xzr,[x11,#0xC10]
  000000000000006C: D10FC3FF  sub         sp,sp,#0x3F0
  0000000000000070: D14127FF  sub         sp,sp,#0x49,lsl #0xC
  0000000000000074: A9007BFD  stp         fp,lr,[sp]
  0000000000000078: 910003FD  mov         fp,sp
  000000000000007C: 910043E0  add         x0,sp,#0x10
  0000000000000080: 94000000  bl          f2
  0000000000000084: D503201F  nop
  0000000000000088: A9407BFD  ldp         fp,lr,[sp]
  000000000000008C: 910FC3FF  add         sp,sp,#0x3F0
  0000000000000090: 914127FF  add         sp,sp,#0x49,lsl #0xC
  0000000000000094: D65F03C0  ret