Closed darrenfreeman closed 2 years ago
-mcpu=cpu
Generate code for a specific M680x0 or ColdFire processor. ...
-mtune=tune
Tune the code for a particular microarchitecture within the constraints set by -march and -mcpu. ...
have a look here: https://gcc.gnu.org/onlinedocs/gcc/M680x0-Options.html#M680x0-Options
Your options do not specify -mcpu
which selects the default on gcc6.5.0b which is -mcpu=68000
.
muls.l dx,dy,dz
or similar 32x32->64___mulsi3
to multiply=> muls.l dx,dy should be allowed for 68060
it should look like: https://franke.ms/cex/z/667z8q
btw: why a 32 bit counter? The counter is also used in pointer arithmetics. Thus before optimization it has to be extended from byte to int. The optimization reduces this to use a 32 counter instead. Later it's use in pointer arithmetic is replaced with auto increment. There is no way to know that now a 8 bit counter would be sufficient.
You have convinced me not to tune for the higher processors when compiling for 68000, and to be especially careful with the 060. But some of what I'm seeing doesn't appear to be sensible behaviour.
I note that simply dropping the -60
from the end of the compiler options, in the toy code that I gave, swaps the function-call-multiply with a few additions instead. So tuning for an 060 actually causes it to use a function call, when tuning for an 020 uses a bunch of adds. This can't be correct behaviour.
As a second issue, indeed the compiler is promoting pointer arithmetic multiplies to 32-bits, and to support the 68000, every 32-bit multiply becomes a function call. If you follow the generated assembly, the multiply is being done between types that fit in <= 16 bits. I understand there is a reason for the compiler forgetting the size of the type while generating the assembly, but the output is still some of the worst code ever seen.
I have been really mangling my source to try to avoid this sort of thing in the output, array indexing really shouldn't be so expensive as to involve a function call every time. There appears to be no benefit to using an index, or array size, smaller than 32-bits.
some hints:
void foo2(short num) {
struct Thing *thing = things;
while (num-- > 0) {
thing->a = 0;
thing->b = 0;
thing->c = 0;
++thing;
}
}
can't do more - code seems ok now
Using commit 03d95f15d2d7c92ed253152c0e50bb080ee323d4 .
Toy code: http://franke.ms/cex/z/K7z9eK
The
for()
loop has been refactored into a loop using pointer increments, and comparison against a pointer to the end of the array. This is great to see! However, calculation of the end address is using the dreaded___mulsi3
when the loop count is a byte. See also issue #165 for a discussion on why evenmulu.w #6,d0
is far from optimal.Note that this seems to be a wider problem, I see unwanted calls to
___mulsi3
all the time. It's really expensive!I would like to firstly see the compiler using a multiply instruction whenever the operands are 16-bits or less.
But what I had in mind, for this loop, is the compiler using
dbra
rather than comparing the pointer with the end of the array. That sidesteps the need for a multiply. I have made my code horribly unreadable by explicitly writing loops to targetdbra
, but in the simple cases where the loop is already being refactored, I was hoping it would tend to pickdbra
.