ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
https://arm-software.github.io/CMSIS-DSP
Apache License 2.0
459 stars 123 forks source link

Issue with arm_mat_mult_f32 with bigger matrices #105

Closed jbr-smtg closed 1 year ago

jbr-smtg commented 1 year ago

Hello everyone,

There seems to be an issue with arm_mat_mult_f32.c with the particular case of two rectangular matrices of significant size.

For example, let's define a matrix of ones A of size 16x900, a matrix ones B of size 1024x16 and a matrix C of size 1024x900. We process C = A*B using arm_mat_mult_f32.

Expected: C should be a matrix 1024x900 with all of its values set to 16; but it is not the case.

This very same test works as expected with arm_mat_mult_f64.

Only ARM_MATH_NEON is set to 1.

Thanks in advance!

Best Regards, JbR

christophe0606 commented 1 year ago

@jbr-smtg Scalar version is working so it is probably an issue with the Neon version. I'll check. What kind of values do you get instead of 16 ?

christophe0606 commented 1 year ago

@jbr-smtg I think it is corrected now in latest commit. The internal indexes of the algorithm were on uint16_t and it is too small for this matrix. So, the indexes were saturating at some point during the computation.

With uint32_t it is now working. The f64 version was already using uint32.

I assume similar problems may exist in other matrix functions ... the library was initially designed for small matrices (and by the way, the current matrix multiply is not the most efficient for bigger matrices or use of cache ... at some point I'll have to rework the matrix multiply)

jbr-smtg commented 1 year ago

Perfect! Many thanks for the very quick and efficient answer! :)