Serious bug in the vlogq_neon_f32 code

ARM-software / EndpointAI

Apache License 2.0

221 stars 99 forks source link

Serious bug in the vlogq_neon_f32 code #13

Open Gao-HaoYuan opened 9 months ago

Gao-HaoYuan commented 9 months ago

I have identified a bug in the vlogq_neon_f32 code. When calculating values close to 1, a significant error occurs, potentially leading to the inversion of the sign bit.

` float a = 0.1; for(int i = 0; i < 9; ++i){ a += 0.1; } printf("%.9f\n", a); float32x4_t in = {0.9f, 1.000000119, 1.1f, 1.2f}; float32x4_t out = vlogq_f32(in); // out = vmulq_n_f32(out, 65535.f); for(int i = 0; i < 4; ++i){ printf("%.9f\t", out[i]); } printf("\n");

for (int i = 0; i < 4; i++) {
    printf("%.9f\t", std::log(in[i]));
}
printf("\n");`

How to obtain correct results

christophe0606 commented 8 months ago

@Gao-HaoYuan Is it a problem in CMSIS-DSP ? Or is it in a copy of this function used somewhere else ?

If it is the original function from CMSIS-DSP, please can you open an issue in https://github.com/ARM-software/CMSIS-DSP

Gao-HaoYuan commented 8 months ago

hello @christophe0606 , This is the result of me copying the function and running it separately. I attempted to comprehend the log function's Taylor expansion around (x-n), but I couldn't ascertain the value of n. I discovered that the source of the error lies in the Taylor expansion function. float32x4_t res = vmlaq_f32(vmlaq_f32(A, B, x2), vmlaq_f32(C, D, x2), x4);

The result of vmlaq_f32(C, D, x2), x4) is a negative number with an absolute value greater than vmlaq_f32(A, B, x2). This leads to the occurrence of -0 when calculating log(1).

and if this issue has been fixed in CMSIS-DSP?

christophe0606 commented 8 months ago

@Gao-HaoYuan I just had a quick look at CMSIS-DSP and I think for Neon there is the problem. A Taylor expansion for the log has a limited validity so probably the argument should be scaled before and an offset added to the result when the argument is not in the right range.

I have created an issue : https://github.com/ARM-software/CMSIS-DSP/issues/151

But unfortunately I won't be able to work on it quickly.

Gao-HaoYuan commented 8 months ago

@christophe0606 I've conducted some tests, and when the value is very close to 1, there are significant differences in the calculation results. It might be beneficial to pay attention to values close to 1, such as 1.000000119, 1.00000119, and 0.9999911.

I attempted to compile and use CMSIS-DSP, but encountered some issues. If I manage to successfully compile it, I can also conduct tests within CMSIS-DSP.

However, I extracted the code for vlogq_f32 from CMSIS-DSP, and there are slight differences compared to EndpointAI's log_tab. When calculating log(1.000000119), the result is still -0 instead of +0.

That's the issue I've currently identified. I've also left a comment about this on https://github.com/ARM-software/CMSIS-DSP/issues/151