Questions about reproduced ECAPA-Tdnn paper

TaoRuijie / ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

MIT License

594 stars 113 forks source link

Questions about reproduced ECAPA-Tdnn paper #34

Closed xyw7 closed 2 years ago

xyw7 commented 2 years ago

I found out there are some differences between your code configrations and original configurations in ECAPA.

The most important one is in your code, you just random choose 1 of the 6 noise to add . And in ECAPA, they use all 6 noise methods which means they have a largger dataset.

I trained the 512 channels model, which only can achieve 1.16 EER (1.01 in ECAPA) , but your result in 1024 channel is even better than ECAPA. So is there any secret you holding about training skill? or you changed the configrations in your upload code ( I just copy your project and change the channel num, and everything else stays the same). OR because the tiny differences in your code leads it is better on a large model.

And thank you for your excellent work! Any help will be appriciated!

Best

TaoRuijie commented 2 years ago

"you just random choose 1 of the 6 noise to add . And in ECAPA, they use all 6 noise methods which means they have a largger dataset." I think that part actually is similar, becuase in each epoch I select the one kind of random noise, that is a online augmentation approach. The size of the dataset is similar.

C=1024, EER is 0.86 in this work, I use AS-norm, without AS-norm it is 0.96 as the readme mentioned. So for the 1.16 you mentioned for C =512, I guess that it is also the result without AS-norm.

The difference might be I use the MSE in the evaluation process (please check the code), it can get about 0.05 improvement. The rest part I think is similar.

xyw7 commented 2 years ago

Thanks for reply. Actually, I have your AS-norm code ( I asked it in Bilibili a few days ago ). And the result after AS-norm backend is 1.16... And did you ever trained 512 model ? How is the result? I am afraid I made some mistake..

TaoRuijie commented 2 years ago

Er I did not train 512, I guess this result is also reasonable

xyw7 commented 2 years ago

Ok, I'll try it again.

JJun-Guo commented 2 years ago

Hi friend, can you share the AS-norm code with me? I will be very grateful to you

tuanh123789 commented 1 year ago

Hi

I found out there are some differences between your code configrations and original configurations in ECAPA.

The most important one is in your code, you just random choose 1 of the 6 noise to add . And in ECAPA, they use all 6 noise methods which means they have a largger dataset.

I trained the 512 channels model, which only can achieve 1.16 EER (1.01 in ECAPA) , but your result in 1024 channel is even better than ECAPA. So is there any secret you holding about training skill? or you changed the configrations in your upload code ( I just copy your project and change the channel num, and everything else stays the same). OR because the tiny differences in your code leads it is better on a large model.

And thank you for your excellent work! Any help will be appriciated!

Best

Hi, can you share your pretrained model with 512 channels