davidgiven / ack

The Amsterdam Compiler Kit
http://tack.sf.net
Other
432 stars 62 forks source link

modern code generator bugs #11

Closed kernigh closed 7 years ago

kernigh commented 7 years ago

I found two bugs in how mcg (commit 4ba409e) compiles plat/linux/libsys/write.c. The function being compiled is

int write(int fd, void* buffer, size_t count)
{
        return _syscall(__NR_write, fd, (quad) buffer, count);
}

The output of ack -mlinuxppc -O6 -c.so write.c is

.sect .text
.sect .rom
.sect .data
.sect .bss
.extern _write

.sect .text
_write:
! saved_size = 4 bytes
! spills_size = 0 bytes
! locals_size = 0 bytes
addi sp, sp, -12
mfspr r0, lr
stw fp, 4(sp)
stw r0, 8(sp)
addi fp, sp, 4
stw r31, 0(fp)

First bug: The saved r31 overwrites the saved r2. We have allocated 12 bytes of stack space, saved r2 in 4(sp), saved lr in 8(sp), but now save r31 in 4(sp), because addi fp, sp, 4 causes 0(fp) to be the same word as 4(sp).

stw r31, 0(fp)
lwz r12, 16(fp)
lwz r11, 12(fp)
lwz r10, 8(fp)
stwu r12, -4(sp)
stwu r11, -4(sp)
! FROMSI.I(int) -> int
stwu r10, -4(sp)
li32 r12, 4
stwu r12, -4(sp)
bl __syscall
addi sp, sp, 16
! FROMUI.I(int) -> int
mr r3, r31

Second bug: Functions seem to store their return values in r3 but load them from r31. This wrong mr r3, r31 is destroying the return value.

! setret4
.anon_bb_0:
lwz r31, 0(fp)
lwz r0, 4(fp)
mtspr lr, r0
lwz r0, 0(fp)
addi sp, fp, 8
mr fp, r0
bclr 20, 0, 0

This is the first bug again. It's now restoring the caller's r31 from 0(r2) and the caller's r2 from 0(r2), but we lost the value of r2. In my test program, r31 held zero, and r2 held the frame pointer of main(). When main() tried to use the frame pointer, it segfaulted.

davidgiven commented 7 years ago

Yup, definitely broken. Thanks for the report.

I've fixed them and pushed the change to dtrg-experimental-mcgg.

The first bug was because the PowerPC stack frame handling was basically mangled and wrong --- I changed it to be simpler and more consistent. (You also get a much more useful stack map in the comment at the top of each procedure.)

The second one was a combination of the PowerPC table never asking for the outputs to be in the right registers in the first place, and a much more subtle problem where the register allocator was unable to allocate the output register because the output register is volatile and the patterns for the call instructions mark the volatile registers as being corrupted. I tweaked the allocator to allow outputs to go into corrupted registers.