Walleclipse / Deep_Speaker-speaker_recognition_system

Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition)
245 stars 81 forks source link

Confusion about Speaker Identification Trial #75

Closed JisongXie closed 3 years ago

JisongXie commented 3 years ago

Hello, @Walleclipse , it seems that your code don't have the speaker identification test. And I have some confusion about the speaker identification trial in the paper. image As we all know, speaker verification is a 1:1 verification process, while speaker identification is a 1:N searching process. The paper randomly picking 1 anchor positive sample(AP) and 99 anchor negative samples(AN) for each anchor utterance. EER can be easily calculated, and it's reasonable, because speaker verification is a 1:1 process and randomly selecting some pairs to test is representative. However, the speaker identification is a 1:N process, and as N becomes larger, it seems that identification accuracy will drop, because the searching space is larger, and there are more likely some persons being misclassified.

I'm kind of confused. Your code has the acc calculation, but it seems that it's the verification process, the accuracy of distinguishing the AN and AP pairs. If the accuracy is calculated like this, or it's only classifying in N=100 space, it sounds unreasonable, to measure the performance of speaker identification.

Some papers split dataset and calculate accuracy like other classification tasks, such as follow: AutoSpeech: Neural Architecture Search for Speaker Recognition image

Walleclipse commented 3 years ago

Hi, You are right. I only consider the speaker verification experiment. In this experiment, the model was trained on librispeech-train-clean dataset, tested on librispeech-test-clean dataset. And the test speakers do not overlap with training. The accuracy also represents the accuracy for verification experiment, of course in verification experiment EER is more important. I think it is not very hard to extend the code to speaker identification experiments. Which is need to modify the batch data processing code.

JisongXie commented 3 years ago

Yes, I am just confused by the speaker identification test in the paper. It seems not that reasonable, or it doesn't clarify clearly. hhh~ But indeed a pretty good paper. Thanks.