Closed ElementW closed 3 years ago
Does it work if you instead simply remove the register keyword?
It does but the generated assembly is suboptimal.
And there's actually a problem in the current code, exhibited in ARM mode of GCC, and just removing register
does not address it.
TLDR: | Compiler | Current | Vanilla var | register asm |
---|---|---|---|---|
GCC ARM -Os |
Bad | Bad | Good | |
GCC ARM -O3 |
BROKEN | Bad | Good | |
GCC Thumb -Os |
Bad | Worse | Good | |
GCC Thumb -O3 |
Bad | Worse | Good | |
clang ARM -Os |
Bad | Bad | Good | |
clang ARM -O3 |
Bad | Bad | Good | |
clang Thumb -Os |
? | Bad | Bad | |
clang Thumb -O3 |
? | Bad | Bad |
With a test function
[[gnu::noinline]] void a() {
*(volatile u32*)(VRAM) = BiosCheckSum();
}
In the current code, temporaries may get assigned to r0
before the svc
call as result
isn't constrained and r0
isn't marked as clobber; thus GCC -O3
generates the following broken assembly:
08001d6c <a()>:
8001d6c: e3a00406 mov r0, #100663296 ; 0x6000000
8001d70: ef0d0000 svc 0x000d0000
8001d74: e1a0c000 mov ip, r0
8001d78: e580c000 str ip, [r0]
8001d7c: e12fff1e bx lr
For correct behavior, r0
must be marked as clobbered, but results are subpar:
-Os
result
is late bound to r0
so you have a useless mov r0, r0
which isn't optimized away.
08001d74 <a()>:
8001d74: ef0d0000 svc 0x000d0000
8001d78: e1a00000 nop ; (mov r0, r0)
8001d7c: e3a03406 mov r3, #100663296 ; 0x6000000
8001d80: e5830000 str r0, [r3]
8001d84: e12fff1e bx lr
-O3
The optimizer fucks up, puts result
in lr
due to register pressure, pushing it to the stack.
08001fb0 <a()>:
8001fb0: e3a0c406 mov ip, #100663296 ; 0x6000000
8001fb4: e52de004 push {lr} ; (str lr, [sp, #-4]!)
8001fb8: ef0d0000 svc 0x000d0000
8001fbc: e1a0e000 mov lr, r0
8001fc0: e58ce000 str lr, [ip]
8001fc4: e49de004 pop {lr} ; (ldr lr, [sp], #4)
8001fc8: e12fff1e bx lr
-Os
and -O3
result
ends up in r4
, which must be pushed
0800288c <a()>:
800288c: b510 push {r4, lr}
800288e: df0d svc 13
8002890: 1c04 adds r4, r0, #0
8002892: 23c0 movs r3, #192 ; 0xc0
8002894: 04db lsls r3, r3, #19
8002896: 601c str r4, [r3, #0]
8002898: bc10 pop {r4}
800289a: bc01 pop {r0}
800289c: 4700 bx r0
800289e: 46c0 nop ; (mov r8, r8)
-Os
and -O3
result
is put in fp
, which must be pushed
0800288c <a()>:
800288c: e92d4800 push {fp, lr}
8002890: e1a0b00d mov fp, sp
8002894: e3a0c406 mov ip, #100663296 ; 0x6000000
8002898: ef0d0000 svc 0x000d0000
800289c: e1a0e000 mov lr, r0
80028a0: e58ce000 str lr, [ip]
80028a4: e8bd4800 pop {fp, lr}
80028a8: e12fff1e bx lr
-Os
and -O3
A mess
0800288c <a()>:
800288c: b5b0 push {r4, r5, r7, lr}
800288e: af02 add r7, sp, #8
8002890: 2003 movs r0, #3
8002892: 0644 lsls r4, r0, #25
8002894: df0d svc 13
8002896: 0005 movs r5, r0
8002898: 6025 str r5, [r4, #0]
800289a: bcb0 pop {r4, r5, r7}
800289c: bc01 pop {r0}
800289e: 4700 bx r0
register asm
-Os
and -O3
0800288c <a()>:
800288c: ef0d0000 svc 0x000d0000
8002890: e3a03406 mov r3, #100663296 ; 0x6000000
8002894: e5830000 str r0, [r3]
8002898: e12fff1e bx lr
-Os
and -O3
08002890 <a()>:
8002890: df0d svc 13
8002892: 23c0 movs r3, #192 ; 0xc0
8002894: 04db lsls r3, r3, #19
8002896: 6018 str r0, [r3, #0]
8002898: 4770 bx lr
-Os
and -O3
0800288c <a()>:
800288c: e3a0c406 mov ip, #100663296 ; 0x6000000
8002890: ef0d0000 svc 0x000d0000
8002894: e58c0000 str r0, [ip]
8002898: e12fff1e bx lr
-Os
and -O3
clang is just bad with Thumb
0800288c <a()>:
800288c: b5d0 push {r4, r6, r7, lr}
800288e: af02 add r7, sp, #8
8002890: 2003 movs r0, #3
8002892: 0644 lsls r4, r0, #25
8002894: df0d svc 13
8002896: 6020 str r0, [r4, #0]
8002898: bcd0 pop {r4, r6, r7}
800289a: bc01 pop {r0}
800289c: 4700 bx r0
The
register
keyword is illegal when compiling standard C++17 and up code. GCC and clang (even in parsing mode) emit a warning about that but this can be an issue when compiling with -Werror, even when not calling the function. The keyword was a no-op on mainstream compilers anyway.Using the
register asm
compiler extension theresult
variable can be forcibly bound tor0
which is the return register of the BIOS checksum function, thus removing the need for a subsequentmov
, which both compilers' optimizers weren't able to remove (even GCC, producingadds r0, r0, #0
).Removing the
mov
also makes the function compile on clang, asmov
is not a mnemonic foradds x, x, #0
but is expected to be the ARMv6+ Thumb-2 instruction equivalent to ARM-modemov
, which obviously isn't available on the GBA.