devkitPro / libgba

C Library for Nintendo GBA
http://devkitpro.org/viewforum.php?f=5
Other
192 stars 26 forks source link

BiosCheckSum: use `register asm` extension #6

Closed ElementW closed 3 years ago

ElementW commented 3 years ago

The register keyword is illegal when compiling standard C++17 and up code. GCC and clang (even in parsing mode) emit a warning about that but this can be an issue when compiling with -Werror, even when not calling the function. The keyword was a no-op on mainstream compilers anyway.

Using the register asm compiler extension the result variable can be forcibly bound to r0 which is the return register of the BIOS checksum function, thus removing the need for a subsequent mov, which both compilers' optimizers weren't able to remove (even GCC, producing adds r0, r0, #0).

Removing the mov also makes the function compile on clang, as mov is not a mnemonic for adds x, x, #0 but is expected to be the ARMv6+ Thumb-2 instruction equivalent to ARM-mode mov, which obviously isn't available on the GBA.

mtheall commented 3 years ago

Does it work if you instead simply remove the register keyword?

ElementW commented 3 years ago

It does but the generated assembly is suboptimal. And there's actually a problem in the current code, exhibited in ARM mode of GCC, and just removing register does not address it.

TLDR: Compiler Current Vanilla var register asm
GCC ARM -Os Bad Bad Good
GCC ARM -O3 BROKEN Bad Good
GCC Thumb -Os Bad Worse Good
GCC Thumb -O3 Bad Worse Good
clang ARM -Os Bad Bad Good
clang ARM -O3 Bad Bad Good
clang Thumb -Os ? Bad Bad
clang Thumb -O3 ? Bad Bad

With a test function

[[gnu::noinline]] void a() {
  *(volatile u32*)(VRAM) = BiosCheckSum();
}

In the current code, temporaries may get assigned to r0 before the svc call as result isn't constrained and r0isn't marked as clobber; thus GCC -O3 generates the following broken assembly:

08001d6c <a()>:
 8001d6c:       e3a00406        mov     r0, #100663296  ; 0x6000000
 8001d70:       ef0d0000        svc     0x000d0000
 8001d74:       e1a0c000        mov     ip, r0
 8001d78:       e580c000        str     ip, [r0]
 8001d7c:       e12fff1e        bx      lr

For correct behavior, r0 must be marked as clobbered, but results are subpar:

Vanilla variable

GCC

ARM, -Os

result is late bound to r0 so you have a useless mov r0, r0 which isn't optimized away.

08001d74 <a()>:
 8001d74:       ef0d0000        svc     0x000d0000
 8001d78:       e1a00000        nop                     ; (mov r0, r0)
 8001d7c:       e3a03406        mov     r3, #100663296  ; 0x6000000
 8001d80:       e5830000        str     r0, [r3]
 8001d84:       e12fff1e        bx      lr

ARM, -O3

The optimizer fucks up, puts result in lr due to register pressure, pushing it to the stack.

08001fb0 <a()>:
 8001fb0:       e3a0c406        mov     ip, #100663296  ; 0x6000000
 8001fb4:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
 8001fb8:       ef0d0000        svc     0x000d0000
 8001fbc:       e1a0e000        mov     lr, r0
 8001fc0:       e58ce000        str     lr, [ip]
 8001fc4:       e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
 8001fc8:       e12fff1e        bx      lr

Thumb, -Os and -O3

result ends up in r4, which must be pushed

0800288c <a()>:
 800288c:       b510            push    {r4, lr}
 800288e:       df0d            svc     13
 8002890:       1c04            adds    r4, r0, #0
 8002892:       23c0            movs    r3, #192        ; 0xc0
 8002894:       04db            lsls    r3, r3, #19
 8002896:       601c            str     r4, [r3, #0]
 8002898:       bc10            pop     {r4}
 800289a:       bc01            pop     {r0}
 800289c:       4700            bx      r0
 800289e:       46c0            nop                     ; (mov r8, r8)

clang

ARM, -Os and -O3

resultis put in fp, which must be pushed

0800288c <a()>:
 800288c:       e92d4800        push    {fp, lr}
 8002890:       e1a0b00d        mov     fp, sp
 8002894:       e3a0c406        mov     ip, #100663296  ; 0x6000000
 8002898:       ef0d0000        svc     0x000d0000
 800289c:       e1a0e000        mov     lr, r0
 80028a0:       e58ce000        str     lr, [ip]
 80028a4:       e8bd4800        pop     {fp, lr}
 80028a8:       e12fff1e        bx      lr

Thumb, -Os and -O3

A mess

0800288c <a()>:
 800288c:       b5b0            push    {r4, r5, r7, lr}
 800288e:       af02            add     r7, sp, #8
 8002890:       2003            movs    r0, #3
 8002892:       0644            lsls    r4, r0, #25
 8002894:       df0d            svc     13
 8002896:       0005            movs    r5, r0
 8002898:       6025            str     r5, [r4, #0]
 800289a:       bcb0            pop     {r4, r5, r7}
 800289c:       bc01            pop     {r0}
 800289e:       4700            bx      r0

register asm

GCC

ARM, both -Osand -O3

0800288c <a()>:
 800288c:       ef0d0000        svc     0x000d0000
 8002890:       e3a03406        mov     r3, #100663296  ; 0x6000000
 8002894:       e5830000        str     r0, [r3]
 8002898:       e12fff1e        bx      lr

Thumb, both -Osand -O3

08002890 <a()>:
 8002890:       df0d            svc     13
 8002892:       23c0            movs    r3, #192        ; 0xc0
 8002894:       04db            lsls    r3, r3, #19
 8002896:       6018            str     r0, [r3, #0]
 8002898:       4770            bx      lr

clang

ARM, both -Osand -O3

0800288c <a()>:
 800288c:       e3a0c406        mov     ip, #100663296  ; 0x6000000
 8002890:       ef0d0000        svc     0x000d0000
 8002894:       e58c0000        str     r0, [ip]
 8002898:       e12fff1e        bx      lr

Thumb, both -Osand -O3

clang is just bad with Thumb

0800288c <a()>:
 800288c:       b5d0            push    {r4, r6, r7, lr}
 800288e:       af02            add     r7, sp, #8
 8002890:       2003            movs    r0, #3
 8002892:       0644            lsls    r4, r0, #25
 8002894:       df0d            svc     13
 8002896:       6020            str     r0, [r4, #0]
 8002898:       bcd0            pop     {r4, r6, r7}
 800289a:       bc01            pop     {r0}
 800289c:       4700            bx      r0