ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
https://arm-software.github.io/CMSIS-DSP
Apache License 2.0
459 stars 123 forks source link

GCC failing to identify SMM[UL\LA]R idioms causes performance issues #117

Open FabKlein opened 10 months ago

FabKlein commented 10 months ago

For tracking

multAcc_32x32_keep32_R, mult_32x32_keep32_R and co macros, involved in Q.31 FFTs,Fast DF1 Biquad / Fast FIRs are supposed to be converted in single ARMv7M-E DSP instruction (SMM[UL\LA]R variant)

This is OK with Arm Compilers and clang but not for GCC (any version) https://godbolt.org/z/7GWqdE3xv

Expected successful conversion (e.g. CLANG): https://godbolt.org/z/3149Knxoc

SMM[UL\LA]R are not part of ACLE, so inline asm may be required

christophe0606 commented 10 months ago

@FabKlein There is already a __SMMLA in CMSIS-Core for gcc. No SMMUL yet ... Same for clang ...

CookiePLMonster commented 4 months ago

For visibility - is there any way to emit SMMUL from GCC that does not involve inline assembly? Neither gcc-cmsis.h or arm_acle.h provide intrinsics for it.

christophe0606 commented 4 months ago

@CookiePLMonster I think the only way is to use inline assembly that can be hidden in a C macro.

CookiePLMonster commented 4 months ago

@christophe0606 Indeed, I tried mimicking intrinsics from cmsis_gcc.h and it works well:

__STATIC_FORCEINLINE int32_t __SMMUL (int32_t op1, int32_t op2)
{
 int32_t result;

 __ASM ("smmul %0, %1, %2" : "=r" (result): "r"  (op1), "r" (op2) );
 return(result);
}

It's just strange that it's omitted from that header file, while e.g. SMMLA exists.

christophe0606 commented 4 months ago

@CookiePLMonster It is handled by the CMSIS project. You may open a github issue there to ask them why it was not done, and if they could add it to the header. (Perhaps there is already an open issue)

CookiePLMonster commented 4 months ago

Would that be https://github.com/ARM-software/CMSIS_6/issues ?

christophe0606 commented 4 months ago

@CookiePLMonster

It is.

But if you're still on CMSIS_5 you also have: https://github.com/ARM-software/CMSIS_5