voice sample length - Githubissues

Hi, The voice sample length really depends on the data set. In some data set the valid voice length (except silence), is less than 3 seconds. If the valid voice length long enough, I think your suggests are really valuable. But I am not sure which is better. On the one hand, in the speaker embedding task, we first embed the utterance of the speaker and "summarize (averaging in this paper)" it into speaker level. So it is reasonable to use more utterance embedding and "summarize" into speaker level. That will more robust for some abnormal utterance, and get a more statistical result. In the other hand, double the embedding can store more information. That may represent more colorful utterance for speakers. Whick can avoid the missing some information. I am apologize to I am not evaluating that two methods. Maybe you can design some experiment of them, it is interesting. Thanks for your suggest!

Walleclipse / Deep_Speaker-speaker_recognition_system

voice sample length #20