astorfi / 3D-convolutional-speaker-recognition

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
Apache License 2.0
778 stars 275 forks source link

EER vs i-vector #33

Closed SpongebBob closed 6 years ago

SpongebBob commented 6 years ago

Hi astorfi, Thanks for such a great work. The pipeline is really great. But I try ai-shell dataset the kaldi i-vector is around 2% eer. 3D-convolutional-speaker-recognitionwith LDA is 17% eer. What's wrong? Any help will thank a lot!

astorfi commented 6 years ago

You mentioned, "3D-convolutional-speaker-recognition with LDA". Everything else is exactly the same? What is the dataset? What is the input length?

astorfi commented 6 years ago

@SpongebBob Consider that our model is a novel architecture based on 3D CNNs. It is not necessarily better than the other methods. For further improvement, I suggest to train and end to end method.

SpongebBob commented 6 years ago

@astorfi Thanks for your replay. I use all the same settings in your code and paper. Without LDA, I get a 20.3% EER. And LDA gives me 3% EER boosting. There is two much gap between i-vector. I use the data and script in official kaldi examples: https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell. I also try another dataset. The gap between i-vector still exists. The train softmax loss converged around 0.9 and accuracy around 90%.

astorfi commented 6 years ago

@SpongebBob I am not aware of Kaldi implementation. It's hard for me to tell the reason.