Closed youyou098888 closed 2 years ago
I use both the whole length and 3-second clips. That is MSA in https://arxiv.org/pdf/2109.03568.pdf, session 8.1
For the duration, that is not important. The reason for 3 seconds is that it performs slightly better than 2 seconds in my testing... But the difference is very limited. You can also try it.
Hi, I notice that at training time, num_frames is 200, so the segment of training audio is 2 seconds. But at eval time, the segment of training audio is 3 seconds, ECAPAModel.py line 63. How come training and testing is not the same length?