TaoRuijie / ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
MIT License
594 stars 113 forks source link

data augmentation #57

Closed JINzezhong7 closed 1 year ago

JINzezhong7 commented 1 year ago

I find in the system description, you say adding one music file and one noise file. but in code, it seems adding one music file and 3-8 speech file.

TaoRuijie commented 1 year ago

in here: https://github.com/TaoRuijie/ECAPA-TDNN/blob/main/dataLoader.py#L58

JINzezhong7 commented 1 year ago

image but in this is music and noise file. the code is music and speech file

TaoRuijie commented 1 year ago

oh that is a mistake for description. thanks to point out

While this part will not effect the result too much, I find augmentation without this type is ok

JINzezhong7 commented 1 year ago

no thanks. I have another question image If you are like me when calculating the score, will the final EER be different?

TaoRuijie commented 1 year ago

Might be similar I guess

JINzezhong7 commented 1 year ago

I experimented with my own model, one with an EER of 3.94 and one with an EER of 4.00. The gap is not too big, can this trick be used to calculate EER in the future? haha

TaoRuijie commented 1 year ago

I cannot understand your question. If you mean the trick for cal EER, it only effect 5% (0.96 vs 1.01)