CMSIS CRL Cortex M returns wrong outputs for higher float values in arm_float_to_q31.c which is under CMSIS-DSP-Source-SupportFunctions directory

ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A

https://arm-software.github.io/CMSIS-DSP

Apache License 2.0

454 stars 122 forks source link

CMSIS CRL Cortex M returns wrong outputs for higher float values in arm_float_to_q31.c which is under CMSIS-DSP-Source-SupportFunctions directory #145

Closed BhaskarsarmaP closed 5 months ago

BhaskarsarmaP commented 5 months ago

For the conversion from float to q31 format, the floating-point number is scaled by the factor of 2^31. Following the scaling, the result is cast into a q63_t type (which corresponds to a 64-bit integer).

(q63_t) (*pIn++ * 2147483648.0f)

Inputs that are in the range of 10^10 or higher will cause an overflow when scaled by 2^31, as the resulting value surpasses the maximum representable range of a 64-bit integer, leading to incorrect outputs.

christophe0606 commented 5 months ago

@BhaskarsarmaP It looks like there is an hidden undocumented assumption in the code. I won't switch the implementation to double precision because it would have an impact on performances and most of the case it is not needed.

I'll update the documentation of the function to inform that if the floats are expected to be very big then it may be better to update the float array before calling this function.