bebbo / gcc

Bebbo's gcc-6-branch for m68k-amigaos
GNU General Public License v2.0
33 stars 11 forks source link

Optimization causes incorrect code(?) #8

Closed mheyer32 closed 6 years ago

mheyer32 commented 6 years ago

ADoom is using a function declared as extern, implemented in assembly. Parameters are passed via registers: void V_DrawPatch(REG(d0, int x), REG(d1, int y), REG(d2, int scrn), REG(a0, patch_t* patch)); But calling that function from C code caused me a lot of grief and debugging. I first thought that the assembly function contained broken code, but it does not. With -O0 the code will run. It turned out that the 'patch' pointer never arrived healthy in the assembler function.

void STlib_updateBinIcon(st_binicon_t* bi, boolean refresh)
{
    int x;
    int y;
    int w;
    int h;

    if (*bi->on && (bi->oldval != *bi->val || refresh)) {
        x = bi->x - SWAPSHORT(bi->p->leftoffset);
        y = bi->y - SWAPSHORT(bi->p->topoffset);
        w = SWAPSHORT(bi->p->width);
        h = SWAPSHORT(bi->p->height);

        if (y - ST_Y < 0)
            I_Error("updateBinIcon: y - ST_Y < 0");

        if (*bi->val) {
            DEBUGPRINT(("V_DrawPatch for %p", bi->p));
            // CALLED HERE:
            V_DrawPatch(bi->x, bi->y, FG,  bi->p);
        } else {
            DEBUGSTEP();
            V_CopyRect(x, y - ST_Y, BG, w, h, x, y, FG);
        }

        bi->oldval = *bi->val;
    }
}

Looking at the gcc produced assembly code revealed nothing:

217:st_lib.c      ****             DEBUGPRINT(("V_DrawPatch for %p", bi->p));
 1068                       .stabd  68,0,217
 1069 04a8 2F2A 0014        move.l 20(a2),-(sp)
 1070 04ac 4879 0000        pea .LC5
 1070      0448 
 1071 04b2 4EB9 0000        jsr _printf
 1071      0000 
 1072 04b8 2079 0000        move.l ___iob,a0
 1072      0000 
 1073 04be 2F28 0004        move.l 4(a0),-(sp)
 1074 04c2 4EB9 0000        jsr _fflush
 1074      0000 
 218:st_lib.c      ****             V_DrawPatch(bi->x, bi->y, FG, bi->p);
 1075                       .stabd  68,0,218
 1076 04c8 206A 0014        move.l 20(a2),a0
 1077 04cc 4282             clr.l d2
 1078 04ce 222A 0004        move.l 4(a2),d1
 1079 04d2 2012             move.l (a2),d0
 1080 04d4 4EB9 0000        jsr _V_DrawPatch
 1080      0000 
 1081 04da 4FEF 000C        lea (12,sp),sp

Looking at the disassembly produced by m68k-objdump didn't show anything interesting either:

00034d04 <_STlib_updateBinIcon>:
   34d04:   48e7 3c34       moveml %d2-%d5/%a2-%a3/%a5,%sp@-
   34d08:   246f 0020       moveal %sp@(32),%a2
   34d0c:   206a 0010       moveal %a2@(16),%a0
   34d10:   4a90            tstl %a0@
   34d12:   677a            beqs 34d8e <_STlib_updateBinIcon+0x8a>
   34d14:   206a 000c       moveal %a2@(12),%a0
   34d18:   2210            movel %a0@,%d1
   34d1a:   b2aa 0008       cmpl %a2@(8),%d1
   34d1e:   6700 00e8       beqw 34e08 <_STlib_updateBinIcon+0x104>
   34d22:   2a52            moveal %a2@,%a5
   34d24:   206a 0014       moveal %a2@(20),%a0
   34d28:   3628 0004       movew %a0@(4),%d3
   34d2c:   3028 0006       movew %a0@(6),%d0
   34d30:   e058            rorw #8,%d0
   34d32:   266a 0004       moveal %a2@(4),%a3
   34d36:   96c0            subaw %d0,%a3
   34d38:   3418            movew %a0@+,%d2
   34d3a:   3810            movew %a0@,%d4
   34d3c:   70e0            moveq #-32,%d0
   34d3e:   d0b9 0000 7280  addl 7280 <_SCREENHEIGHT>,%d0
   34d44:   2a0b            movel %a3,%d5
   34d46:   9a80            subl %d0,%d5
   34d48:   6b00 00ee       bmiw 34e38 <_STlib_updateBinIcon+0x134>
   34d4c:   4a81            tstl %d1
   34d4e:   6744            beqs 34d94 <_STlib_updateBinIcon+0x90>
   34d50:   2f2a 0014       movel %a2@(20),%sp@-
   34d54:   4879 0003 4cf0  pea 34cf0 <_STlib_initBinIcon+0x38>
   34d5a:   4eb9 0003 c2d8  jsr 3c2d8 <_printf>
   34d60:   2079 0000 6720  moveal 6720 <___iob>,%a0
   34d66:   2f28 0004       movel %a0@(4),%sp@-
   34d6a:   4eb9 0003 cbc4  jsr 3cbc4 <_fflush>

   34d70:   206a 0014       moveal %a2@(20),%a0          ; patch address
   34d74:   4282            clrl %d2                                  ; 'screen'  
   34d76:   222a 0004       movel %a2@(4),%d1              ; X
   34d7a:   2012            movel %a2@,%d0                  ; Y
   34d7c:   4eb9 0000 19e0  jsr 19e0 <_V_DrawPatch>

   34d82:   4fef 000c       lea %sp@(12),%sp
   34d86:   206a 000c       moveal %a2@(12),%a0
   34d8a:   2550 0008       movel %a0@,%a2@(8)
   34d8e:   4cdf 2c3c       moveml %sp@+,%d2-%d5/%a2-%a3/%a5
   34d92:   4e75            rts

I then used WinUAE to debug the issue. I found that it would always call V_DrawPatch with 0x4000600 in A0, so I set a breakpoint in that condition and waited. Et voila:

403FDFA2 48e7 2036                MOVEM.L D2/A2-A3/A5-A6,-(A7)
403FDFA6 246f 001c                MOVEA.L (A7, $001c) == $404b65c0,A2
403FDFAA 206a 0010                MOVEA.L (A2, $0010) == $40412fa4,A0
403FDFAE 4a90                     TST.L (A0)
403FDFB0 6700 0096                BEQ.W #$0096 == $403fe048 (T)
403FDFB4 222a 0008                MOVE.L (A2, $0008) == $40412f9c,D1
403FDFB8 206a 000c                MOVEA.L (A2, $000c) == $40412fa0,A0
403FDFBC 2010                     MOVE.L (A0),D0
403FDFBE b081                     CMP.L D1,D0
403FDFC0 6700 008e                BEQ.W #$008e == $403fe050 (T)
403FDFC4 74ff                     MOVE.L #$ffffffff,D2
403FDFC6 b480                     CMP.L D0,D2
403FDFC8 677e                     BEQ.B #$0000007e == $403fe048 (T)
403FDFCA b481                     CMP.L D1,D2
403FDFCC 675c                     BEQ.B #$0000005c == $403fe02a (T)
403FDFCE 206a 0014                MOVEA.L (A2, $0014) == $40412fa8,A0
403FDFD2 2070 1c00                MOVEA.L (A0, D1.L*4, $00) == $040005fc,A0
403FDFD6 3028 0004                MOVE.W (A0, $0004) == $04000604,D0
403FDFDA e058                     ROR.W #$00000008,D0
403FDFDC 2a52                     MOVEA.L (A2),A5
403FDFDE 9ac0                     SUBA.W D0,A5
403FDFE0 3028 0006                MOVE.W (A0, $0006) == $04000606,D0
403FDFE4 e058                     ROR.W #$00000008,D0
403FDFE6 266a 0004                MOVEA.L (A2, $0004) == $40412f98,A3
403FDFEA 96c0                     SUBA.W D0,A3
403FDFEC 3018                     MOVE.W (A0)+,D0
403FDFEE e058                     ROR.W #$00000008,D0
403FDFF0 3c40                     MOVEA.W D0,A6
403FDFF2 3010                     MOVE.W (A0),D0
403FDFF4 e058                     ROR.W #$00000008,D0
403FDFF6 3040                     MOVEA.W D0,A0
403FDFF8 70e0                     MOVE.L #$ffffffe0,D0
403FDFFA d0b9 4041 3dd8           ADD.L $40413dd8,D0
403FE000 220b                     MOVE.L A3,D1
403FE002 9280                     SUB.L D0,D1
403FE004 2001                     MOVE.L D1,D0
403FE006 6d58                     BLT.B #$00000058 == $403fe060 (F)
403FE008 42a7                     CLR.L -(A7)
403FE00A 2f0b                     MOVE.L A3,-(A7)
403FE00C 2f0d                     MOVE.L A5,-(A7)
403FE00E 2f08                     MOVE.L A0,-(A7)
403FE010 2f0e                     MOVE.L A6,-(A7)
403FE012 4878 0004                PEA.L $00000004
403FE016 2f00                     MOVE.L D0,-(A7)
403FE018 2f0d                     MOVE.L A5,-(A7)
403FE01A 4eb9 403f d448           JSR $403fd448
403FE020 206a 000c                MOVEA.L (A2, $000c) == $40412fa0,A0
403FE024 2010                     MOVE.L (A0),D0
403FE026 4fef 0020                LEA.L (A7, $0020) == $404b65c4,A7

403FE02A 206a 0014                MOVEA.L (A2, $0014) == $40412fa8,A0           ; patch address
403FE02E 2070 0c00                MOVEA.L (A0, D0.L*4, $00) == $04000604,A0   ; ????
403FE032 4282                     CLR.L D2                                                                ; screen
403FE034 222a 0004                MOVE.L (A2, $0004) == $40412f98,D1              ; Y
403FE038 2012                     MOVE.L (A2),D0                                                     ; X
403FE03A 4eb9 4044 e488           JSR $4044e488                                              ; call V_DrawPatch 

403FE040 206a 000c                MOVEA.L (A2, $000c) == $40412fa0,A0
403FE044 2550 0008                MOVE.L (A0),(A2, $0008) == $40412f9c
403FE048 4cdf 6c04                MOVEM.L (A7)+,D2/A2-A3/A5-A6
403FE04C 588f                     ADDA.L #$00000004,A7
403FE04E 4e75                     RTS

How in the world did the 403FF5DE 2070 0c00 MOVEA.L (A0, D0.L*4, $00) == $04000604,A0
line squeezed in there and why?

I also wonder how the code that objdump disassembled is so different from what is apparently running on the Amiga? One of the jsr's (to fflush?) seems to be missing... this is all very confusing. Attaching the preprocessed source: st_lib.txt

mheyer32 commented 6 years ago

Forgot to post the commandline I sused to build st_lib.c with: (remove -E)

matze@ubuntu-VirtualBox:~/adoom_git/adoom_src$ make st_lib.o
m68k-amigaos-gcc -v -E -Ofast -fstrength-reduce -fbaserel -mcrt=clib2 -m68030 -m68881 -fomit-frame-pointer -Werror -Wimplicit -Wstrict-prototypes -Wno-int-conversion -DVERBOSE -D__BIG_ENDIAN__ -DNORMALUNIX -DAMIGA -Diabs=abs -c st_lib.c -o st_lib.o
Using built-in specs.
COLLECT_GCC=m68k-amigaos-gcc
Target: m68k-amigaos
Configured with: /home/matze/amigatoolchain/gcc-6.2/amigaos-cross-toolchain/submodules/gcc-6/configure --prefix=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos --infodir=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/m68k-amigaos/info --mandir=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/share/man --prefix=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos --prefix=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos --target=m68k-amigaos --enable-languages=c,c++,objc --enable-version-specific-runtime-libs --disable-libssp --with-headers=/home/matze/amigatoolchain/gcc-6.2/amigaos-cross-toolchain/.build-m68k/sources/ixemul-48.2/include
Thread model: single
gcc version 6.3.1b 20171229-105257 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-E' '-Ofast' '-fbaserel' '-mcrt=clib2' '-mcpu=68030' '-m68881' '-fomit-frame-pointer' '-Werror' '-Wimplicit' '-Wstrict-prototypes' '-Wno-int-conversion' '-D' 'VERBOSE' '-D' '__BIG_ENDIAN__' '-D' 'NORMALUNIX' '-D' 'AMIGA' '-D' 'iabs=abs' '-c' '-o' 'st_lib.o'
 /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/libexec/gcc/m68k-amigaos/6.3.1b/cc1 -E -quiet -v -imultilib libb -D__HAVE_68881__ -isystem /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/m68k-amigaos/clib2/include -DCLIB2 -D__CLIB2__ -D__CLIB2 -D VERBOSE -D __BIG_ENDIAN__ -D NORMALUNIX -D AMIGA -D iabs=abs st_lib.c -o st_lib.o -mcrt=clib2 -mcpu=68030 -m68881 -Werror -Wimplicit -Wstrict-prototypes -Wno-int-conversion -fbaserel -fomit-frame-pointer -Ofast
#include "..." search starts here:
#include <...> search starts here:
 /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/m68k-amigaos/clib2/include
 /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/include
 /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/../../../../m68k-amigaos/sys-include/../../include
 /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/../../../../m68k-amigaos/sys-include
 /home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/../../../../m68k-amigaos/include
End of search list.
COMPILER_PATH=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/libexec/gcc/m68k-amigaos/6.3.1b/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/libexec/gcc/m68k-amigaos/6.3.1b/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/libexec/gcc/m68k-amigaos/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/../../../../m68k-amigaos/bin/
LIBRARY_PATH=/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/libb/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/../../../../m68k-amigaos/lib/libb/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/:/home/matze/amigatoolchain/gcc-6.2/m68k-amigaos/lib/gcc/m68k-amigaos/6.3.1b/../../../../m68k-amigaos/lib/
COLLECT_GCC_OPTIONS='-v' '-E' '-Ofast' '-fbaserel' '-mcrt=clib2' '-mcpu=68030' '-m68881' '-fomit-frame-pointer' '-Werror' '-Wimplicit' '-Wstrict-prototypes' '-Wno-int-conversion' '-D' 'VERBOSE' '-D' '__BIG_ENDIAN__' '-D' 'NORMALUNIX' '-D' 'AMIGA' '-D' 'iabs=abs' '-c' '-o' 'st_lib.o'
matze@ubuntu-VirtualBox:~/adoom_git/adoom_src$ 
mheyer32 commented 6 years ago

-Os produces working code

bebbo commented 6 years ago

your version: gcc version 6.3.1b 20171229-105257 (GCC)

please test the recent verison: gcc version 6.4.1b 20180113-204200 (GCC)

.L88:
    move.l 20(a2),a0
    clr.l d2
    move.l a3,d1
    move.l a5,d0
    jsr _V_DrawPatch 
bebbo commented 6 years ago

Hm, the code displayed by WinUAE seems really different. Thoughts:

It's wierd since there is an additional statement which is inserted - not overwriting some other statement.

mheyer32 commented 6 years ago

Scrap that. I wanted to try the older 2.97 toolchain and had to undef FAR far. If I #define FAR to far with the 6.4.1b toolchain, I get it compiled and linked.

Sorry for the churn! Now let me check, if the 6.4.1 code is doing better :-)

mheyer32 commented 6 years ago

Sad to report, but GCC 6.4.1.b is seriously borked.

With clib2 ADoom hangs at startup in an endless loop, regardless of optimization options chosen. With noixemul, ADoom crashes in the first OpenLibrary call. The same code worked before.

Even a simple test like this:

#include <stdio.h>

int main(void)
{
   printf("%4x %4x %4x %4x\n", (int)0x0102, (int)(0xff01), (int)(0x7fff),  (int)(0x8090));
}

Fails for CLIB2 by printing nonsense ("4x %x %x %x") instead of interpreting the format string correctly. The same test compiled with -noixemul works.

bebbo commented 6 years ago
4x %x %x %x

UPS! ... found it:

    jne .L46
    move.l a6,d0    | 3. omitted since now d0 is not used.
    addq.l #1,d0    | 1. falsly omitted since the operator is not honored.
    move.b 1(a6),d5
    ext.w d5
    ext.l d5
    jeq .L25
    move.l d0,a6    | 2. omitted since now redundant.
    moveq #0,d4

bogus code:

      // do not clear if self assigned
      int dregno = ii.get_dst_regno ();
      if (dregno != ii.get_src_regno ())

fixed code:

      // do not clear if self assigned unless there is an operator
      int dregno = ii.get_dst_regno ();
      if (dregno != ii.get_src_regno () || ii.get_src_op ())

and now:

 102 ff01 7fff 8090

[devel1 58558869848]

mheyer32 commented 6 years ago

Thanks, that fixed the CLIB2 startup problem. I can now start ADoom again, but different levels of -O result in different failures:

-O0 runs the game... there are still rendering errors, but I believe these are due to bugs in the game code -Os breaks the character placement in the main menu, but would eventually run the game -O1 shows main menu correctly, but then hangs in an infinite loop due to the original bug described here (i.e. pointer passed to V_DrawPatch function is broken) -O2 breaks main menu and hangs in V_DrawPatch -O3 same as O2 -Ofast same as O2/3

As a side note: I also managed to get the whole thing compile/link with -noixemul instead of -mcrt-clib2. There were only a few socket related typedefs different and I needed to link against '-lsocket' instead of '-lnet'. But as stated above, ADoom compiled with -noixemul won't even startup.

I uploaded the source to https://github.com/mheyer32/ADoom - give it a shot. I believe it may be a really good test case for the toolchain.

mheyer32 commented 6 years ago

I just learnt about -fbbb=- Using this I can get into the game just like with -O0, even when using -Ofast

It doesn't help with the noixemul==no startup problem, though.

mheyer32 commented 6 years ago

-fbbb= p seem to cause the hang in V_DrawPatch (messing up the parameters passed via registers into the assembly function)

-fbbb = r messes up the menu drawing in V_DrawPatchIndirect (a regular C function)

bebbo commented 6 years ago

The only use op (p) is in st_stuff:

:bbb: in 'ST_loadGraphics'
:bbb: (p) propagate_moves condition met, moving regs d3, a0
:bbb: (p) propagate_moves condition met, moving regs d4, a0 

And you can see what -fbbb=p does. Before:

.L219:
    move.l d2,-(sp)
    pea .LC19
    move.l a2,-(sp)
    jsr (a5)
    pea 1.w
    move.l a2,-(sp)
    jsr (a3)
    move.l d0,(a6)
    move.l d3,a0
    move.l (a0)+,4(a6)
    move.l a0,d3
    addq.l #1,d2
    addq.l #8,a6
    lea (20,sp),sp
    moveq #8,d0
    cmp.l d2,d0
    jne .L219

After:

    move.l d3,a0
.L219:
    move.l d2,-(sp)
    pea .LC19
    move.l a2,-(sp)
    jsr (a5)
    pea 1.w
    move.l a2,-(sp)
    jsr (a3)
    move.l d0,(a6)

    move.l (a0)+,4(a6)

    addq.l #1,d2
    addq.l #8,a6
    lea (20,sp),sp
    moveq #8,d0
    cmp.l d2,d0
    jne .L219
    move.l a0,d3

But you also see that it' not valid here since a0 is trashed during the calls...

mheyer32 commented 6 years ago

Perfectly possible :-) I saw some issue where ST_loadGraphics either produced invalid lump-name-strings or passed the wrong string pointer to W_CacheLumpName.

mheyer32 commented 6 years ago

-fbbb= p seems to work now.

-fbbb= r still messes up the main menu (all drawn menu items appear in the upper left corner)

bebbo commented 6 years ago

-fbbb = r messes up the menu drawing in V_DrawPatchIndirect (a regular C function)

here an invalid rename causes emitting divsl.l d1,d1:d2 instead of divs.l d1,d2 trashing the d1 register.

bebbo commented 6 years ago

please test

mheyer32 commented 6 years ago

Confirmed, -fbbb = r does not mess up the menus anymore.

I think, all the concerns in this issue have been fixed. Very much obliged, bebbo! :-D

Closing.