bebbo / gcc

Bebbo's gcc-6-branch for m68k-amigaos
GNU General Public License v2.0
33 stars 11 forks source link

Shifts and rotates - count operand extending #206

Closed michalsc closed 11 months ago

michalsc commented 1 year ago

This is actually not related to Your changes to gcc, this is a generic gcc issue when generating m68k code. In case of shifts and rotations where count is given by register, gcc always extends the count to 32-bit operand, generating unnecessary code. The shift/rotate count is always taken modulo 64 so it does not matter if it is specified as 8, 16 or 32-bit variable.

What we have now is following:

unsigned lsl_b(unsigned a, char b)
{
    return a << b;
}

unsigned lsl_w(unsigned a, short b)
{
    return a << b;
}

unsigned lsl_l(unsigned a, unsigned b)
{
    return a << b;
} 

compiled with -Os -fomit-frame-pointer -m68020 -mregparm=2 gives:

__Z5lsl_bjc:
        extb.l d1
        lsl.l d1,d0
        rts
__Z5lsl_wjs:
        ext.l d1
        lsl.l d1,d0
        rts
__Z5lsl_ljj:
        lsl.l d1,d0
        rts

both extb.l d1 and ext.l d1 are unnecessary. Expected result would be:

__Z5lsl_bjc:
        lsl.l d1,d0
        rts
__Z5lsl_wjs:
        lsl.l d1,d0
        rts
__Z5lsl_ljj:
        lsl.l d1,d0
        rts

The same issue applies for all rotate/shift operations and gcc adds unnecessary signed/unsigned extension to long in all cases.

bebbo commented 1 year ago

if you solve this properly you also have to consider cases without regparam. gcc transforms this often and since the 680** don't have a single instruction for both steps:

  1. read mem
  2. expand the internal code looks like
    d1:QI=[sp:SI+0xb]
    d1:SI=sign_extend(d1:QI)
    d0:SI=[sp:SI+0x4]
    d0:SI=d0:SI<<d1:SI
    use d0:SI

Thus you have to handle cases where there are instructions inbetween the sign_extend and the shift.

It can be done, just needs time...

bebbo commented 1 year ago

http://franke.ms/cex/z/qGfbec

michalsc commented 1 year ago

yeah, there it is. Now, the question is, do we need this: d1:SI=sign_extend(d1:QI) or can we just convince gcc to use d1:QI in any case? Ah, jus noticed the gcc rtl way of writing it: can we tell gcc to leave d1:SI, d1:QI or d1:HI intact for shifts? Or can we just tell gcc that m68k supports shifts by QI, HI and SI operands and just declare them all as the very same ASL/ASR and so on?

bebbo commented 1 year ago

please test, this http://franke.ms/cex/z/qGfbec looks ok now.

michalsc commented 1 year ago

Yeah, on this example it looks great. However, the same should apply also to shifts right and it doesn't yet (unless this was just a proof of concept to see if you are on right track).

BTW: What I have also noticed is that gcc does not emit other LSL/LSR sizes as long. E.g. following code

unsigned short lsl_b(unsigned short a, char b)
{
    return a >> b;
}

unsigned short lsl_w(signed short a, short b)
{
    return (unsigned short)(a << b);
}

will be compiled to

__Z5lsl_btc:
        and.l #65535,d0
        extb.l d1
        asr.l d1,d0
        rts
__Z5lsl_wss:
        ext.l d0
        lsl.l d1,d0
        rts

where asr.l or lsl.l could be simply be replaced with asr.w or lsl.w without actually needing to mask or sign extend the operand in d0 register.

bebbo commented 1 year ago

available in ~36 mins

bebbo commented 1 year ago

The used instructions do not contain shifts for shorter modes. Maybe that can be added, but I'm not sure about this...

bebbo commented 1 year ago

well, the main problem is, that C/C++ converts everything to int before the shift gets applied.

  _2 = (int) a;
  _4 = (int) b;
  _5 = _2 >> _4;
  _6 = (short unsigned int) _5;
  return _6;
bebbo commented 12 months ago

please test: http://franke.ms/cex/z/4Teznb

bebbo commented 11 months ago

http://franke.ms/cex/z/TofYzY

michalsc commented 11 months ago

Looks great for me! Thanks!

I've also checked ROL/ROR and they seem to behave correctly too.