TaoRuijie / ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
MIT License
594 stars 113 forks source link

why does training and testing audio have different length? #10

Closed youyou098888 closed 2 years ago

youyou098888 commented 2 years ago

Hi, I notice that at training time, num_frames is 200, so the segment of training audio is 2 seconds. But at eval time, the segment of training audio is 3 seconds, ECAPAModel.py line 63. How come training and testing is not the same length?

TaoRuijie commented 2 years ago

I use both the whole length and 3-second clips. That is MSA in https://arxiv.org/pdf/2109.03568.pdf, session 8.1

For the duration, that is not important. The reason for 3 seconds is that it performs slightly better than 2 seconds in my testing... But the difference is very limited. You can also try it.