I've been generating my own data (8 speakers in different language) and training together with VOCASET (8 speakers in English). Since the period hyper-parameter for positional encoding is related with the speakers, I was wondering
if you need to tune period hyper-parameter with 16 speakers
First of all, thanks for the great work!
I've been generating my own data (8 speakers in different language) and training together with VOCASET (8 speakers in English). Since the period hyper-parameter for positional encoding is related with the speakers, I was wondering
Thanks in advance!