Closed githubaf closed 2 years ago
minimal: https://franke.ms/cex/z/eYE68K
a test program which produces different output using the above provided minimal.
typedef unsigned long ULONG;
typedef unsigned short USHORT;
USHORT AltPen[16];
__far long scrnrowzip[2000];
long xx[3] = {3, 55, 88};
long yy[3] = {4, 77, 123};
long EdgeSize;
short *Edge1,*Edge2, render, high, wide;
double qqq;
float* zbuf;
float *ElevationMap;
USHORT *bytemap;
struct Library * GfxBase0;
struct RastPort;
struct Window {
char x[50];
struct RastPort * RPort;
};
#include <stdlib.h>
void *get_Memory(long zsize, long attributes) {
return malloc(zsize);
}
void free_Memory(void *memblock, long zsize) {
return free(memblock);
}
extern void MapCloud(struct Window *win, short *CloudVal, short *IllumVal, long Elev);
short CloudVal[3] = {1, 10, 100};
short IllumVal[3] = {2, 22, 333};
int main() {
MapCloud(0, CloudVal, IllumVal, 2);
for (int i = 0; i < EdgeSize; ++i) {
printf("%i: %d %d\n", i, Edge1[i], Edge2[i]);
}
return 0;
}
this is the code where the error occurs:
.L11:
move.w d0,a6
move.l a6,d6 | <-- d6 as index register
move.l d1,a6
fmovem.l fpcr,d6 | <-- d6 to track the fpcr
moveq #16,d7
or.l d6,d7
and.w #-33,d7
fmovem.l d7,fpcr
fmove.w fp2,(a6,d6.l*2) | <-- fpcr value (0) is used instead of index
fmovem.l d6,fpcr
addq.w #1,d0
fdadd.x fp1,fp2
cmp.w d3,d0
jle .L11
jra .L12
d6 is used as index, but also used to store th fpcr value...
uh, oh:
(insn 193 787 194 15 (parallel [
(set (mem:HI (plus:SI (mult:SI (reg:SI 6 d6)
(const_int 2 [0x2]))
(reg:SI 14 a6)) [1 *_115+0 S2 A16])
(fix:HI (fix:DF (reg/v:DF 18 fp2 [orig:164 EdgeVal ] [164]))))
(clobber (reg:SI 6 d6))
(clobber (reg:SI 7 d7))
]) min.c:73 110 {fix_truncdfhi2}
(nil))
d6 is used inside of the insn and as clobber... ... which is ok, since d6 is dead after the insn. BUT: the generated asm kills d6 before using it:
return "fmovem%.l %!,%2\;moveq #16,%3\;or%.l %2,%3\;and%.w #-33,%3\;fmovem%.l %3,%!\;fmove%.l %1,%0\;fmovem%.l %2,%!";
if one of the clobbers overlap with that reg needs a push/pop.
the generated code now checks for collisions and uses a different register with push/pop:
move.l d0,-(a7)
fmovem.l fpcr,d0
moveq #16,d7
or.l d0,d7
and.w #-33,d7
fmovem.l d7,fpcr
fmove.w fp2,(a6,d6.l*2)
fmovem.l d0,fpcr
move.l (a7)+,d0
the test program now yields identical values
I admire your ability to see such issues in the assembly output.
the fix is live
Issue is gone, thank you!
But I found another one. (Happening with -m68020-40 or -m68020-60) I will try to narrow it down and report.
While performing extensive tests with different gcc options to find the fastest executable I stumbled over a new problem. Usually the files with "GUI" in the name were compiled with "-Os", all others with "-O2". In my last test I compiled all files with "-Os" and that resulted in distorted clouds in the calculated picture. The problem occurs when the file MapTopo.c is compiled with "-Os". I tried the single functions in that file and found the function void MapCloud(struct Window win, short CloudVal, short *IllumVal, long Elev); to be the critical one. The issue is not important for WCS as -O2 seems to produce a faster executable, but anyway I would like to report this strange behavior.
Please find the preprocessed C-File in the attachement.
MapTopo.zip compile with
m68k-amigaos-gcc -Wall -c -fmessage-length=0 -funsigned-char -MMD -MP -MF"MapTopo.d" -MT"MapTopo.o" "MapTopo_ok/bad_pre.c" -o "MapTopo.o" -g -noixemul -m68040 -fomit-frame-pointer -fbaserel -ffast-math -mregparm -Os
(There is a pragma GCC optimize ("O2") inside the good file)