What is the meaning of LOG2TOLOG_Q15 constant?

olegkhr commented 1 year ago

why is it equal to LOG2TOLOG_Q31?

christophe0606 commented 1 year ago

@olegkhr It is to convert a log2 value to a normal log (base e) Initially the q15 version was using a q15 log. It was not accurate enough. So internally now the q15 version is also using a q31 log. As consequence, the two values are now equal. The naming is a bit confusing since the Q31 or Q15 in the name is not referring to the format of the constant but to the format of the function.

olegkhr commented 1 year ago

@christophe0606 thanks for the explanation.

For the very low-power signal - mel spectrum is computed as all less than 1 values. Then vlog will output negative values. But then after applying offset all the values become positive: https://github.com/ARM-software/CMSIS-DSP/blob/main/Source/TransformFunctions/arm_mfcc_q15.c#L179.

That results in a large difference in the first coefficient in DCT between float and quant versions.

Is it expected behavior?

christophe0606 commented 1 year ago

@olegkhr This offset is required because of the shifts occurring in fixed point. Q15 will never be able to be close to f32 for small signals but the tradeoff saturation / accuracy of the algorithm may be changed a little.

Can you share an example signal and MFCC parameters so that I can't look at it. Or better, do you have a Python to reproduce the problem ?

You can have a look here : https://github.com/ARM-software/CMSIS-DSP/issues/54#issuecomment-1271118407

I had written some Python to compare CMSIS-DSP MFCC, TensorFlow MFCC and Librosa MFCC.

I you share a f32 version, I can make the q15 to compare and see what's happening. But just a example signal and the MFCC params would be enough.

olegkhr commented 1 year ago

MFCC params:

sample_rate = 16000 FFTSize = 512 numOfDctOutputs = 49

freq_min = 20 freq_high = 7000 numOfMelFilters = 60

Example signal: [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0 -1 0 -3 -4 -4 -4 -5 -5 -4 -5 -5 -5 -5 -6 -6 -7 -7 -7 -7 -7 -7 -7 -8 -8 -8 -8 -8 -7 -7 -7 -6 -6 -6 -6 -6 -7 -7 -7 -6 -6 -6 -6 -6 -6 -6 -7 -7 -7 -8 -7 -7 -6 -6 -6 -6 -7 -7 -7 -7 -7 -6 -6 -6 -5 -6 -5 -5 -5 -6 -5 -5 -5 -5 -5 -5 -5 -5 -5 -6 -6 -6 -6 -6 -6 -6 -7 -7 -7 -6 -7 -7 -7 -7 -6 -6 -5 -5 -4 -4 -4 -4 -4 -4 -4 -4 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -4 -3 -3 -3 -3 -3 -3 -3 -3 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 -4 -4 -4 -4 -4 -3 -3 -3 -2 -2 -2 -2 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -4 -4 -4 -4 -5 -5 -5 -4 -4 -4 -4 -5 -5 -5 -4 -4 -4 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -4 -3 -3 -3 -4 -4 -4 -4 -5 -5 -5 -5 -4 -4 -4 -5 -5 -5 -5 -5 -5 -5 -4 -3 -2 -2 -2 -2 -3 -3 -3 -3 -2 -2 -1 -1 -1 -1 -1 -1 -2 -2 -2 -2 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 0 0 1 1 1 2 1 2 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 2 3 4 4 4 4 4 3 3 3 3 3 3 3 3 3 4 4 4 3 4 4 4 5 5 5 6 5 5 5 6 6 5 5 5 6 6 6 6 5 4 4 4 5 4 4 4 4 3 3 3 2 2 3 2 3 3 4 4 5 5 4 3 4 4 4 4 4 4 5 4 4 4 4 4 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 6 6 6 6 6 5 6 5 6 6 6 6 6 6 5 5 5 5 5 5 6 6 5 5 5 6 5 5 5 6 6 6 7 7 7 7 7 7 7 7 7 8 8 8 8 8 7 8 7 7 7 7 8 7]

olegkhr commented 1 year ago

https://github.com/ARM-software/CMSIS-DSP/blob/main/Source/TransformFunctions/arm_mfcc_q15.c#L179

@christophe0606 Does pTmp and logExponent have the same binary format?

christophe0606 commented 1 year ago

@olegkhr Sorry for late answer. I was off for some time. I'll look at all of this as soon as I can.

olegkhr commented 12 months ago

@christophe0606 Could you please review this patch? With this patch I am getting much better match:

diff --git a/Source/TransformFunctions/arm_mfcc_q15.c b/Source/TransformFunctions/arm_mfcc_q15.c
index 7512afd1..3668585f 100755
--- a/Source/TransformFunctions/arm_mfcc_q15.c
+++ b/Source/TransformFunctions/arm_mfcc_q15.c
@@ -96,10 +96,10 @@ arm_status arm_mfcc_q15(
     // q15
     arm_absmax_q15(pSrc,S->fftLen,&m,&index);

+    int16_t shift;
     if (m !=0)
     {
        q15_t quotient;
-       int16_t shift;

        status = arm_divide_q15(0x7FFF,m,&quotient,&shift);
        if (status != ARM_MATH_SUCCESS)
@@ -171,7 +171,7 @@ arm_status arm_mfcc_q15(

     // q5.26

-    logExponent = fftShift + 2 + SHIFT_MELFILTER_SATURATION_Q15;
+    logExponent = fftShift + 2 + SHIFT_MELFILTER_SATURATION_Q15 - shift;
     logExponent = logExponent * LOG2TOLOG_Q15;

christophe0606 commented 12 months ago

@olegkhr I think you have found a problem. The signal is rescaled to have more accuracy for the FFT and magnitude computation but the rescaling is not compensated later.

The patch above is identifying the problem but not applying the rescaling compensation where it should. I'll look at it.

Q31 also has the problem.

Thanks for reporting it and thanks for being patient because I have been very slow at looking at this github issue.

christophe0606 commented 12 months ago

@olegkhr The latest commit https://github.com/ARM-software/CMSIS-DSP/commit/60a8c8805cf78b766dbb8f6d1e0cd22534a6c102 is correcting a rescaling issue in all MFCCs versions.

Tell me if it solves your problem.

I observe that the MFCC Q15 is closer to the other versions with this fix.

Also, the MFCC F32 is closer to the TensorFlow one (I was observing a small divergence on a test pattern and it is now corrected with this commit).

ARM-software / CMSIS-DSP

What is the meaning of LOG2TOLOG_Q15 constant? #107