bebbo / gcc

Bebbo's gcc-6-branch for m68k-amigaos
GNU General Public License v2.0
33 stars 11 forks source link

-Os leads to problems in WCS #171

Closed githubaf closed 2 years ago

githubaf commented 2 years ago

While performing extensive tests with different gcc options to find the fastest executable I stumbled over a new problem. Usually the files with "GUI" in the name were compiled with "-Os", all others with "-O2". In my last test I compiled all files with "-Os" and that resulted in distorted clouds in the calculated picture. The problem occurs when the file MapTopo.c is compiled with "-Os". I tried the single functions in that file and found the function void MapCloud(struct Window win, short CloudVal, short *IllumVal, long Elev); to be the critical one. The issue is not important for WCS as -O2 seems to produce a faster executable, but anyway I would like to report this strange behavior.

Please find the preprocessed C-File in the attachement.

MapTopo.zip compile with

m68k-amigaos-gcc -Wall -c -fmessage-length=0 -funsigned-char -MMD -MP -MF"MapTopo.d" -MT"MapTopo.o" "MapTopo_ok/bad_pre.c" -o "MapTopo.o" -g -noixemul -m68040 -fomit-frame-pointer -fbaserel -ffast-math -mregparm -Os

(There is a pragma GCC optimize ("O2") inside the good file)

bebbo commented 2 years ago

minimal: https://franke.ms/cex/z/eYE68K

bebbo commented 2 years ago

a test program which produces different output using the above provided minimal.

typedef unsigned long ULONG;
typedef unsigned short USHORT;

USHORT AltPen[16];

__far long scrnrowzip[2000];
long xx[3] = {3, 55, 88};
long yy[3] = {4, 77, 123};
long EdgeSize;
short *Edge1,*Edge2, render, high, wide;
double qqq;
float* zbuf;
float *ElevationMap;
USHORT *bytemap;

struct Library * GfxBase0;
struct RastPort;
struct Window {
    char x[50];
    struct RastPort * RPort;
};

#include <stdlib.h>

void *get_Memory(long zsize, long attributes) {
  return malloc(zsize);
}
void free_Memory(void *memblock, long zsize) {
  return free(memblock);
}

extern void MapCloud(struct Window *win, short *CloudVal, short *IllumVal, long Elev);

short CloudVal[3] = {1, 10, 100};
short IllumVal[3] = {2, 22, 333};

int main() {
  MapCloud(0, CloudVal, IllumVal, 2);
  for (int i = 0; i < EdgeSize; ++i) {
    printf("%i: %d %d\n", i, Edge1[i], Edge2[i]);
  }
  return 0;
}
bebbo commented 2 years ago

this is the code where the error occurs:

.L11:
        move.w d0,a6
        move.l a6,d6    | <-- d6 as index register
        move.l d1,a6
        fmovem.l fpcr,d6 | <-- d6 to track the fpcr
        moveq #16,d7
        or.l d6,d7
        and.w #-33,d7
        fmovem.l d7,fpcr
        fmove.w fp2,(a6,d6.l*2) | <-- fpcr value (0) is used instead of index
        fmovem.l d6,fpcr
        addq.w #1,d0
        fdadd.x fp1,fp2
        cmp.w d3,d0
        jle .L11
        jra .L12

d6 is used as index, but also used to store th fpcr value...

bebbo commented 2 years ago

uh, oh:

(insn 193 787 194 15 (parallel [
            (set (mem:HI (plus:SI (mult:SI (reg:SI 6 d6)
                            (const_int 2 [0x2]))
                        (reg:SI 14 a6)) [1 *_115+0 S2 A16])
                (fix:HI (fix:DF (reg/v:DF 18 fp2 [orig:164 EdgeVal ] [164]))))
            (clobber (reg:SI 6 d6))
            (clobber (reg:SI 7 d7))
        ]) min.c:73 110 {fix_truncdfhi2}
     (nil)) 

d6 is used inside of the insn and as clobber... ... which is ok, since d6 is dead after the insn. BUT: the generated asm kills d6 before using it:

return "fmovem%.l %!,%2\;moveq #16,%3\;or%.l %2,%3\;and%.w #-33,%3\;fmovem%.l %3,%!\;fmove%.l %1,%0\;fmovem%.l %2,%!";

if one of the clobbers overlap with that reg needs a push/pop.

bebbo commented 2 years ago

the generated code now checks for collisions and uses a different register with push/pop:

    move.l d0,-(a7)
    fmovem.l fpcr,d0
    moveq #16,d7
    or.l d0,d7
    and.w #-33,d7
    fmovem.l d7,fpcr
    fmove.w fp2,(a6,d6.l*2)
    fmovem.l d0,fpcr
    move.l (a7)+,d0
bebbo commented 2 years ago

the test program now yields identical values

githubaf commented 2 years ago

I admire your ability to see such issues in the assembly output.

bebbo commented 2 years ago

the fix is live

githubaf commented 2 years ago

Issue is gone, thank you!

But I found another one. (Happening with -m68020-40 or -m68020-60) I will try to narrow it down and report.