Output Mismatch in ARM mult , scale and other mult cases for q31 in CMSIS 5.9 version

ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A

https://arm-software.github.io/CMSIS-DSP

Apache License 2.0

459 stars 123 forks source link

Output Mismatch in ARM mult , scale and other mult cases for q31 in CMSIS 5.9 version #113

Closed BhaskarsarmaP closed 10 months ago

BhaskarsarmaP commented 11 months ago

Here, As shown in the image above (https://github.com/ARM-software/CMSIS_5/blob/5.9.0/CMSIS/DSP/Source/BasicMathFunctions/arm_mult_q31.c) For q31 case, we are first right shifting and then left shifting by a bit which results in precision loss of LSB. When this is run multiple times, It can result in a huge loss of data.

For q15, q7 it is handled correctly like below. image

The issue also exists for mult, scale, and other cases where q31 mult operations are involved.

Please let us know the plans for fixing this.

christophe0606 commented 11 months ago

@BhaskarsarmaP It is done to have correct saturation. You need to consider the bit 32 (counting from 0) to be able to detect a potential saturation.

So it won't be fixed because having correct saturation is important.

The function is a tradeoff between saturation / accuracy and speed.

We could make a different implementation with correct saturation and full accuracy but it would be more complex and use more cycles.

What's the use case you have in mind where this loss one of LSB would cause an issue ?

Generally, long sequences of multiplies are used in multiply and accumulate. And in this case, the saturation and shift is only done at the end of the sequence so accuracy is kept (and you don't use this mult function but a dedicated mac function).

Perhaps you'd like a Multiply/accumulate function in the BasicMath folder ?

BhaskarsarmaP commented 11 months ago

Could you explain why it's different from the q15 and q31 case? Why it's the issue for only q31 and not q15?

Here is the code, when the block count < 4. I tried using the below code for q31 similar to q15 *pDst++ = (q31_t) __SSAT((((q63_t) (*pSrcA++) * (*pSrcB++)) >> 31), 32);

For q15 *pDst++ = (q15_t) __SSAT((((q31_t) (*pSrcA++) * (*pSrcB++)) >> 15), 16);

q15 case is working when both inputs are -1. q31 case which I modified is not working when both inputs are -1 and the output is -1 which is wrong.

Please elaborate and explain this case.

christophe0606 commented 11 months ago

In Q15, you'll compare a int32 value with 0x07FFF and you can detect if it is > 0x07FFF and in that case SSAT is saturating (0x07FFF is (1<<15)-1).

In Q31, if you use 32 in SSAT you compare with (1<<31)-1 = 0x7FFFFFFF but you can't check if it is > 0x7FFFFFFF because 0x80000000 is negative. So, you need to use SSAT with 31 and it gives a comparison with 0x3FFFFFFF. You can check x > 0x3FFFFFFF in int32 since the value just above is 0x40000000 and is still positive.

BhaskarsarmaP commented 11 months ago

Thanks for the info. Do you have an optimized version without having LSB precision loss. Inputs and Outputs must be in q31 format.

christophe0606 commented 11 months ago

@BhaskarsarmaP In practice, you use the multiply as part of other kernels like arm_dot_prod_q31 or arm_mat_mult_q31.

If the multiply is used alone, the LSB is generally not an issue. It is only if you use long sequence of multiplies and in those case there are other kernels that can be used.

With 30 accurate bits, you still have 180 dB of SNR.