ALIZE-Speaker-Recognition / android-alize

ALIZE for the Android platform.
GNU Lesser General Public License v3.0
35 stars 17 forks source link

About the result score #9

Open ra2637 opened 6 years ago

ra2637 commented 6 years ago

Hi, I'd like to ask how can I evaluate the score in the result? I ran the examples of Alize website and after normalization, the scores are shown as spk01 -0.269968 0.247707

However, the android-alize result shows score as 40 or 110 when I ran it in android app. Can you tell me how to interpret the result?

Thanks!

jfb84 commented 6 years ago

Hello Theoritically the "score" is the log of the LR, the likelihood ratio between the likelihood of hypothesis "the speech extract corresponds to the model" and the hypothesis "the speech extract was pronounced by someone else". A LR is between ]0 1.0, +infinite. 1.0 is the balanced answer where each hypothesis has the same strenght. LR>1.0 supports more the first hpotheis when LR between 0 and 1 supports more the second one. So a log LR, LLR, is between -infinite to +infinite

Now ALIZE, like all speaker recognition engines, is outputing a raw practical estimation of the LLR. Two steps of "normalization" are usually applied

The difference you highlighted comes from the fact that there is no feature normalization applied in this exammple of ALIZE Android. Some the mathematic domain of the inputs are different and the outputted LLr very different (before normalization).

It shows the difference between theory and practical implementation... And shows that someone will have to add normalisation in the ALIZE android example! Best

jfb84 commented 6 years ago

(featue level normalization is usually applied file by file)

ra2637 commented 6 years ago

Got it. Thank you very much.

I have another question about recognition accuracy. I have several users in my DB including myself, and every time it recognizes me as different users. Since many studies showed that alize has great speaker recognition performance, I wonder what can I do to improve my performance. I have tried to trim the silence frames of training input and created UBM from TIMIT DB. Input files are wav with sample rate 16000, mono channel, and PCM_16BIT encode.

Thank you.

TrungThanhTran commented 6 years ago

Hi all, I guess the Android code just supports GMM right now. jfb84, please tell me if any problems with i-vector on Android.

Thank you.