Closed jbr-smtg closed 1 year ago
@jbr-smtg Scalar version is working so it is probably an issue with the Neon version. I'll check. What kind of values do you get instead of 16 ?
@jbr-smtg I think it is corrected now in latest commit. The internal indexes of the algorithm were on uint16_t and it is too small for this matrix. So, the indexes were saturating at some point during the computation.
With uint32_t it is now working. The f64 version was already using uint32.
I assume similar problems may exist in other matrix functions ... the library was initially designed for small matrices (and by the way, the current matrix multiply is not the most efficient for bigger matrices or use of cache ... at some point I'll have to rework the matrix multiply)
Perfect! Many thanks for the very quick and efficient answer! :)
Hello everyone,
There seems to be an issue with arm_mat_mult_f32.c with the particular case of two rectangular matrices of significant size.
For example, let's define a matrix of ones A of size 16x900, a matrix ones B of size 1024x16 and a matrix C of size 1024x900. We process C = A*B using arm_mat_mult_f32.
Expected: C should be a matrix 1024x900 with all of its values set to 16; but it is not the case.
This very same test works as expected with arm_mat_mult_f64.
Only ARM_MATH_NEON is set to 1.
Thanks in advance!
Best Regards, JbR