Closed smeijer1234 closed 5 years ago
Hi @smeijer1234,
thanks for your comprehensive feedback.
Cheers, Jonatan
@JonatanAntoni As far as I know (and see), we no more have instance of __SIMD32 in CMSIS-DSP ? They have been replaced by new functions defined in arm_math.h.
So I think this issue can be closed ?
Yes, we have removed the usage of this macro from DSP code. We kept the definition for backward compatibility reasons. Using the macro is highly discouraged.
Casting addresses of different types is undefined behaviour as done here:
https://github.com/ARM-software/CMSIS_5/blob/6ae6b6689c84001a71dfe6d8f2885c70e4d051d9/CMSIS/DSP/Include/arm_math.h#L451
This gives, for example, problems in rm_correlate_q15.c, arm_conv_partial_q15.c, and arm_conv_q15.c. In these cases, we read from an input stream that is a pointer to a 16-bit type, and we "pack" 2 elements of this by first casting to a 32-bit pointer and then dereferencing this.
I've very quickly drafted something, and what we should instead do is something like this:
We memcpy 4 bytes from bot input streams, and store this to a 32-bit values, which feed into some function. This memcpy should be lowered to just load and stores (because we're mempcy'ing small amounts of data). For this example, we generate this:
This shows we do word loads, and also have a post-increment on the pointer, thus there shouldn't be any performance concerns. This assumes though that unaligned access is supported. When this is not supported or enabled, we will generate half-word loads:
Anyway, I think the solution should be something along the lines of hiding this sequence:
in a primitive/macro/function along the lines of:
unsigned read_16x2_ia(...) { .. }
indicating that we are reading two 16-bit values, "increment after" the pointer, and return both 16-bit values "packed" in an integer.