Closed rjobling closed 2 years ago
One downside of my solution seems to be that the compiler will add ext.l before the muls because 'a' is copied to 'r' which is an int.
So far I haven't found a solution that doesn't either add some extra unnecessary code or remove necessary upper bits.
I ran into this again because I needed to remove some 32bit multiplies.
The support code as it is doesn't return the full 32bit result unless you use the version I provided above.
I haven't really found a satisfying solution either. It's always either non-optimal code or breaking code.
The current code has the problem that if the result doesn't fit in 16bits then you get the wrong value.
I've been trying to use mulsw to make sure the compiler does a 16x16=32 multiplication. But as things are the compiler seems to assume mulsw results can be truncated to 16bits. I'm seeing generated code that does this. I think its because the inline asm used 'a' for the result and 'a' is a short. So it optimizes out the additional bits you might hope for in the returning int.
Well, anyway I think this alternative works:
inline int mulsw(short a, short b) { int r = a; asm("mulsw %1,%0":"+d"(r): "mid"(b): "cc"); return r; }
But you will probably want to check that yourself and do the same for muluw.