Question about saturate in conv_fast_q15

mengts commented 4 years ago

Hi, In arm_conv_fast_opt_q15(), it uses uses a 32-bit accumulator with 2.30 format to store intermediate value. Then, the result should be converted to 1.15 with right shift 15 bits and saturate to 1.15 as the code below. https://github.com/ARM-software/CMSIS_5/blob/8d31ab2d5d32a2234e82124d3111ac1a68be1dd1/CMSIS/DSP/Source/FilteringFunctions/arm_conv_fast_opt_q15.c#L354

However, in conv_fast_q15() as below, the result(2.30 format) directly converts to 1.15 by discard the highest bit after right shift 15 bits.

I think it might need to use same saturate method as arm_conv_fast_opt_q15() to avoid error in positive and negative. And the same situation is also in arm_correlate_fast_q15() and arm_conv_partial_fast_q15(). Thanks. https://github.com/ARM-software/CMSIS_5/blob/8d31ab2d5d32a2234e82124d3111ac1a68be1dd1/CMSIS/DSP/Source/FilteringFunctions/arm_conv_fast_q15.c#L229

christophe0606 commented 4 years ago

@mengts You're right. Thanks for reporting it.

christophe0606 commented 4 years ago

After looking at it again, I won't fix it. It is technically correct that it may saturate and the sum may use a __SSAT to avoid it.

But, the accumulator has only one guard bit which means it will saturate very quickly and the results (saturated or not) will be wrong from signal processing point of view.

This function is requiring the input to be scaled to avoid any saturation and give a correct result.

The focus of this function is speed by use of a smaller accumulator and no SSAT instructions.

The other function arm_conv_q15 is using a bigger accumulator and a SSAT instruction and is not requiring pre-scaling for this reason.

mengts commented 4 years ago

Ok, I got it. Thanks for your reply.

ARM-software / CMSIS_5

Question about saturate in conv_fast_q15 #829