ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
https://arm-software.github.io/CMSIS-DSP
Apache License 2.0
531 stars 137 forks source link

arm_float_to_q31: why not to use VCVT.F32.S32 on Cortex-M4F #212

Open AlanCui4080 opened 2 days ago

AlanCui4080 commented 2 days ago

Hi,

As what we know, Cortex-M4F implemented a little set of FPU instructions including VCVT.F32.S32, and GCC did have a builtin intrinsic for it. But the question is why the intrinsic is only enabled when NEON available in GCC, and also, why not to use it in arm_float_to_q31.

Alan.

christophe0606 commented 2 days ago

@AlanCui4080 I don't see any reason. Perhaps the function was first developped for M0 and was not upgraded to support all other architectures.

AlanCui4080 commented 2 days ago

@christophe0606

Sorry for my mistake, it's ok to include arm_neon.h on M4F, even there is only a subset of NEON implemented on M4F, but the vcvt_s32_f32 will call a SIMD instruction "FCVTZS Vd.2S,Vn.2S" which is invalid on M4F. The only vaild one is "VCVT.F32.S32 Sd,Sm #fbits" included in FPv4-SP. I'm testing it on my STM32G4, if that be ok, i will put a pull request.

christophe0606 commented 2 days ago

@AlanCui4080 I won't include arm_neon.h for M4F.

There are thus two possibilities : this intrinsics is supported by the Arm C language extensions (ACLE). Unfortunately, there are still too many compilers that are not fully implementing all of the ACLE.

That's why most of the intrinsics used by CMSIS-DSP are coming from CMSIS-Core (part of CMSIS-6). And probably, you'll need to open an issue on CMSIS-Core if you want this new intrinsic to be supported

AlanCui4080 commented 2 days ago

@christophe0606 I figure out it, ACLE have no single precision version (FPv4-SP) for vcvt_s32_f32, it's either d-register or q-register in ACLE. So I do believe someone forget to add it into ACLE, because this instruction is unpopular and not wide used.

And the CMSIS-Core seems to be not include any part of FPU instructions, where should i put it in.

Note: following inline asm is proved usable as a replacement of arm_float_to_q31.

asm("vcvt.s32.f32 %0, %0, #31": "+t"(thetaf_divpi.f)::);