arm_float_to_q31: why not to use VCVT.F32.S32 on Cortex-M4F

AlanCui4080 commented 1 month ago

Hi,

As what we know, Cortex-M4F implemented a little set of FPU instructions including VCVT.F32.S32, and GCC did have a builtin intrinsic for it. But the question is why the intrinsic is only enabled when NEON available in GCC, and also, why not to use it in arm_float_to_q31.

Alan.

christophe0606 commented 1 month ago

@AlanCui4080 I don't see any reason. Perhaps the function was first developped for M0 and was not upgraded to support all other architectures.

AlanCui4080 commented 1 month ago

@christophe0606

Sorry for my mistake, it's ok to include arm_neon.h on M4F, even there is only a subset of NEON implemented on M4F, but the vcvt_s32_f32 will call a SIMD instruction "FCVTZS Vd.2S,Vn.2S" which is invalid on M4F. The only vaild one is "VCVT.F32.S32 Sd,Sm #fbits" included in FPv4-SP. I'm testing it on my STM32G4, if that be ok, i will put a pull request.

christophe0606 commented 1 month ago

@AlanCui4080 I won't include arm_neon.h for M4F.

There are thus two possibilities : this intrinsics is supported by the Arm C language extensions (ACLE). Unfortunately, there are still too many compilers that are not fully implementing all of the ACLE.

That's why most of the intrinsics used by CMSIS-DSP are coming from CMSIS-Core (part of CMSIS-6). And probably, you'll need to open an issue on CMSIS-Core if you want this new intrinsic to be supported

AlanCui4080 commented 1 month ago

@christophe0606 I figure out it, ACLE have no single precision version (FPv4-SP) for vcvt_s32_f32, it's either d-register or q-register in ACLE. So I do believe someone forget to add it into ACLE, because this instruction is unpopular and not wide used.

And the CMSIS-Core seems to be not include any part of FPU instructions, where should i put it in.

Note: following inline asm is proved usable as a replacement of arm_float_to_q31.

asm("vcvt.s32.f32 %0, %0, #31": "+t"(thetaf_divpi.f)::);

AlanCui4080 commented 3 weeks ago

@christophe0606 Hi, Is that ok to add a inline assembly specilized for M4? If so, i will open a pull request.

christophe0606 commented 3 weeks ago

@AlanCui4080 asm definitions are provided through CMSIS-Core by defining new intrinsics macros in the compiler headers

So you can try to open a github issue on the CMSIS repository.

JonatanAntoni commented 3 weeks ago

@christophe0606, @AlanCui4080, where does this new instrinsic belong to? Is this one the correct location? https://github.com/ARM-software/CMSIS_6/blob/8c4dc58928b3347f6aa98b6fb2bf6770f32a72b7/CMSIS/Core/Include/cmsis_gcc.h#L877-L992

Please feel free to raise a PR adding the required define. Please bare in mind that this is location not specific to Cortex-M4F. Nor is is specialized for Arm-v7EM but is pulled for A- and R-class as well. If this is strictly for Arm-v7EM, it would need to go into https://github.com/ARM-software/CMSIS_6/blob/main/CMSIS/Core/Include/m-profile/cmsis_gcc_m.h et al.

christophe0606 commented 3 weeks ago

@JonatanAntoni @AlanCui4080 It must be added to each compiler and would be for Cortex-M only. (Since the intrinsics is defined with Neon header).

I don't have compiler specific implementation (except in case of major compiler bug) in CMSIS-DSP so this new intrinsics should be made available for all compiler so that CMSIS-DSP can still be built with all of them.

Otherwise, it is now possible to define functions as WEAK for linker in CMSIS-DSP. So, another possibility is for a user to replace a specific function with a different implementation if for some reason we can't do it on our side.

AlanCui4080 commented 3 weeks ago

@christophe0606, @AlanCui4080, where does this new instrinsic belong to? Is this one the correct location? https://github.com/ARM-software/CMSIS_6/blob/8c4dc58928b3347f6aa98b6fb2bf6770f32a72b7/CMSIS/Core/Include/cmsis_gcc.h#L877-L992

@JonatanAntoni No, that is not a DSP instruction, it's belong to FPU. I have never seen intrinsics for FPUs in CMSIS-Core. I think that's because CMSIS-DSP is designed for fixed-point at beginning, and FPU instructions are duty of compliers.

Meanwhile, there is no complier include intrinsics for FPU instuctions (But have MVE&NEON) to itself. Well, so i think people all focused on newer architecture such one have MVE. So armv7 old FPU just be forgot like 8087 insturctions.

See https://developer.arm.com/documentation/ddi0439/b/BEHJADED, since there is no absolute value statement, square root statement, or q31 to float statement in C language, compliers will never generate such instructions to accelerate program running. For newlib, they wrote a inline asm for sqrtf() see https://github.com/bminor/newlib/blob/master/newlib/libm/machine/arm/e_sqrt.c, this fact also pointed out that "there is no complier include intrinsics for FPU instuctions". By the way, i didn't see they did the same thing for "FABS.F32" or "FCVT.XX.XX", may because just no one cared.

These FPU instructions are orphans, i have no idea about where to add them.

ARM-software / CMSIS-DSP

arm_float_to_q31: why not to use VCVT.F32.S32 on Cortex-M4F #212