ARM-software / CMSIS_5

CMSIS Version 5 Development Repository
http://arm-software.github.io/CMSIS_5/index.html
Apache License 2.0
1.33k stars 1.08k forks source link

Undefined behaviour in casting addresses #546

Closed smeijer1234 closed 5 years ago

smeijer1234 commented 5 years ago

Casting addresses of different types is undefined behaviour as done here:

https://github.com/ARM-software/CMSIS_5/blob/6ae6b6689c84001a71dfe6d8f2885c70e4d051d9/CMSIS/DSP/Include/arm_math.h#L451

This gives, for example, problems in rm_correlate_q15.c, arm_conv_partial_q15.c, and arm_conv_q15.c. In these cases, we read from an input stream that is a pointer to a 16-bit type, and we "pack" 2 elements of this by first casting to a 32-bit pointer and then dereferencing this.

I've very quickly drafted something, and what we should instead do is something like this:

unsigned do_something(unsigned x, unsigned y);
void Foo (short *PX, short *PY, unsigned N) {
    for (unsigned i=0; i<N; i++) {
        unsigned inputA;
        memcpy(&inputA, PX, 4);
        PX = PX + 2;
        unsigned inputB;
        memcpy(&inputB, PY, 4);
        PY = PY + 2;
        do_something(inputA, inputB);
    }
}

We memcpy 4 bytes from bot input streams, and store this to a 32-bit values, which feed into some function. This memcpy should be lowered to just load and stores (because we're mempcy'ing small amounts of data). For this example, we generate this:

    .LBB0_1:
ldr r0, [r5, #4]!
ldr r1, [r6, #4]!
bl  do_something
subs    r4, #1
bne .LBB0_1

This shows we do word loads, and also have a post-increment on the pointer, thus there shouldn't be any performance concerns. This assumes though that unaligned access is supported. When this is not supported or enabled, we will generate half-word loads:

    .LBB0_1:
ldrh    r0, [r5, #4]!
ldrh    r1, [r6, #4]!
ldrh    r2, [r5, #2]
ldrh    r3, [r6, #2]
orr.w   r0, r0, r2, lsl #16
orr.w   r1, r1, r3, lsl #16
bl  do_something
subs    r4, #1
bne .LBB0_1

Anyway, I think the solution should be something along the lines of hiding this sequence:

unsigned inputB;
memcpy(&inputB, PY, 4);
PY = PY + 2;

in a primitive/macro/function along the lines of:

unsigned read_16x2_ia(...) { .. }

indicating that we are reading two 16-bit values, "increment after" the pointer, and return both 16-bit values "packed" in an integer.

JonatanAntoni commented 5 years ago

Hi @smeijer1234,

thanks for your comprehensive feedback.

Cheers, Jonatan

christophe0606 commented 5 years ago

@JonatanAntoni As far as I know (and see), we no more have instance of __SIMD32 in CMSIS-DSP ? They have been replaced by new functions defined in arm_math.h.

So I think this issue can be closed ?

JonatanAntoni commented 5 years ago

Yes, we have removed the usage of this macro from DSP code. We kept the definition for backward compatibility reasons. Using the macro is highly discouraged.