Correlation function arm_correlate_f32 has wrong output array length expectation for NEON

ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A

https://arm-software.github.io/CMSIS-DSP

Apache License 2.0

518 stars 133 forks source link

Correlation function arm_correlate_f32 has wrong output array length expectation for NEON #120

Open yuvpg opened 1 year ago

yuvpg commented 1 year ago

arm_correlate_f32() expecting the output array size to be 2*srcALen - 1 instead of srcALen + srcBLen - 1 for ARM_MATH_DSP case.

Maybe just changing line 361 will be enough to fix the issue.

Now the output is unexpected, and call leads to buffer overflow if you don't guess output array size correctly:

    const int kerN = 7;
    float ker[kerN] = {+1,+1,+1,-1,-1,+1,-1};
    const int dataN = 21;
    float data[dataN] = {0, 0,+1,+1,+1,-1,-1,+1,-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    const int resN = 2*dataN-1; // why so long array needed?
    float res[resN];

    for (int i = 0; i < resN; i++) {
        res[i] = NAN;
    }

    arm_correlate_f32(data, dataN, ker, kerN, res);

results in

res={nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,0,0,-1,0,-1,0,-1,0,7,0,-1,0,-1,0,-1,0,0,0,0,0,0,0,0,0,0,0,0};

christophe0606 commented 10 months ago

In the documentation I can read:

The pDst should be initialized to all zeros before being used.

and also in documentation:

size of buffer should 2 * max(srcALen, srcBLen) - 1

So I don't see any bug. It behaves as expected but I agree that it may be improved to use less memory.

I think the reason it is done like that is to:

Avoid tests in the code
Avoid having to pad the entries so that buffers have same lengths

rohardy commented 6 months ago

From the documentation :

The result c[n] is of length 2 * max(srcALen, srcBLen) - 1 and is defined over the interval n=0, 1, 2, ..., (2 * max(srcALen, srcBLen) - 2). The output result is written to pDst and the calling function must allocate 2 * max(srcALen, srcBLen) - 1 words for the result.

but should not the result c[n] be defined over the interval n=0, 1, 2, ..., srcALen + srcBLen - 2 ? Since a full correlation returns an array of size srcALen + srcBLen - 1 .

Edit

Depending of whether scrALen is greater than srcBLen (or not), c[n] is not defined on the same interval.

If srcALen < srcBLen then n=0, 1, 2, ..., srcALen + srcBLen - 2. If srcALen > srcBLen then n=srcALen-srcBLen,..., 2*srcALen - 2

I found the documentation a bit misleading about the definition domain of c.

christophe0606 commented 6 months ago

@rohardy The output is padded with zeros. It is longer than what is strictly needed.

rohardy commented 6 months ago

@christophe0606 I understand, but I think the documentation could be more precise about where the actual result lays.

christophe0606 commented 6 months ago

@rohardy You're right. Documentation will be improved.