Snowdar / asv-subtools

An Open Source Tools for Speaker Recognition
Apache License 2.0
597 stars 135 forks source link

Issues about evaluation results (Cavg and EER) of baseline system of OLR2020 Challenge #4

Closed pengyizhou closed 3 years ago

pengyizhou commented 4 years ago

Hi. We want to ask two questions about the evaluation results of OLR2020 Challenge baseline system.

1、Cavg and eer of task1 using test data AP19-OLR-channel

We noticed that in Table 2, the official results of task1 using i-vector is: Cavg--0.2965 EER%--19.40. But we get a results like: Cavg--0.2997, EER%--29.91. We are doubting that is there any probability that the official results have mistakenly put a wrong EER% number into Table 2. Just like the pictures below, we find that the EER number 19.40% present not only in Table2, but Table3, of the same task, and the Cavg and EER in Table3 are not too far different which is opposite to Table2. And in other literatures using Cavg and EER as their evaluation criterion, we also barely see any circumstances that has such a big difference between the two number. We hope the members of official would check the results, thanks! image

2、The way computeCavg.py calculates Cavg in open-set identification task

image We are using the formular like the picture above to calculate Cavg in each task. But we find something that we don't understand in the python code computeCavg.py which gives a prior probability greater than 1 in task2 which is an open-set task. In task1, we have 6 languages in both enrollset and testset, and it gives a prior probability of 0.5 for target-language, and 0.1 for each non-target-languages. There is no problem here. It computes Cavg like image In task2, we have 6 languages in testset but 3 in enrollset. The program sees lang_num as 3, and get prior probability of each non-target-language of 0.25. But what is odd is that the program sees the other 3 languages which are not in enrollment as another non-target-language and gives it a prior probability of 0.25. I have already followed each step of this code and get some of the parameters showed below. The code is like

line 113 p_nontarget = (1 - p_target) / (lang_num - 1)  # lang_num=3, p_nontarget=0.25
line 114 target_cavg[lang] = p_target * p_miss + p_nontarget*sum(p_fa)  # p_fa is a list, length is 3, p_fa[2] represents false alarm probability of the overall of three languages not in enrollset

Finally it computes Cavg like image where Ln(n=3) represents the overall of those languages not in enrollset. So from the formular above, we get an entire prior probability of 0.5+0.25*3=1.25 which is greater than 1. I don't konw is there any misunderstanding on this formular or how the program works...Could you please give us some hints on this?

Sincerely Yizhou Peng

Snowdar commented 4 years ago

Hi, thank you for finding out these problems and we have fixed them.

YunzhaoLu commented 3 years ago

Hi, all, How do you identify an input as an unknown language if the enrolling set has only 3 in-set languages?

Thanks, Luke

Snowdar commented 3 years ago

Hi, Here, when EER is computed, a threshold related to this error rate will be get and you could use this threshold to identify the test utterance. In this case, if the score of test utterance is little than this threshold, then it could be seen as an unknow language.

On Dec 2, 2020, at 5:01 PM, Luke notifications@github.com wrote:

 Hi, all, How do you identify an input as an unknown language if the enrolling set has only 3 in-set languages?

Thanks, Luke

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

YunzhaoLu commented 3 years ago

But EER is calculated only after the golden truth is provided.