loWorld_Program_Add__single_single_:
0000000000000000 stp x29, x30, [sp, #-0x20]! ; backing up x29,x30 which we should not need
0000000000000004 mov x29, sp ; backing up sp only for the redundant stores
0000000000000008 str s0, [x29, #0x10] ; redundant store 1
000000000000000c str s1, [x29, #0x18] ; redundant store 2
0000000000000010 ldr s0, [x29, #0x10] ; redundant load 1
0000000000000014 ldr s1, [x29, #0x18] ; redundant load 2
0000000000000018 fadd s0, s0, s1
000000000000001c mov sp, x29
0000000000000020 ldp x29, x30, [sp], #0x20
0000000000000024 ret
The function arguments are passed in regs s0 and s1. They are stored to memory and then loaded again, degrading performance. The expected codegen would look more like this:
loWorld_Program_Add__single_single_:
0000000000000000 fadd s0, s0, s1
0000000000000004 ret
When JITting the following function
this code is generated on arm64 Mac:
The function arguments are passed in regs
s0
ands1
. They are stored to memory and then loaded again, degrading performance. The expected codegen would look more like this: