Janghyun1230 / Speaker_Verification

Tensorflow implementation of "Generalized End-to-End Loss for Speaker Verification"
MIT License
349 stars 104 forks source link

weird inference results for similar speakers #11

Open tonytonyissaissa opened 5 years ago

tonytonyissaissa commented 5 years ago

Hi @Janghyun1230 I trained the model based on vctk dataset (by reproducing your work). As for inference, I am trying to verify speakers from LibriSpeech dataset. I obtained bizarre results each time. For instance, below are results of the same speaker (I splitted the *.wav of this speaker in two different folders and I feed them to the model). Hence, N=2 (but in real we have the same speaker) and M=4 utterances. The results below indicate that the model failed to detect that we deal with the same speaker. Do you have any explanation of this ? should I try to train the model on a bigger dataset in order to get better results ? inference time for 16 utterences : 0.18s [[[0.87 0.29] [0.79 0.07] [0.93 0.24] [0.81 0.17]]

[[0.42 0.89] [0.4 0.81] [0.53 0.73] [0.52 0.62]]]

EER : 0.00 (thres:0.54, FAR:0.00, FRR:0.00)