I am training the network on VCTK corpus (framerate =48kHZ
109 speakers with an average of 300 utterances per speaker)
I got a very high EER ( 0.45) and I couldn't understand why the performance is poor (because of frame rate or there is not enough data or there is a problem in the model )
Any thoughts (for data augmentation I don't think adding noise because the model removes the noise in the data preprocessing )
Your help is much appreciated.
Thank you.
I am training the network on VCTK corpus (framerate =48kHZ 109 speakers with an average of 300 utterances per speaker) I got a very high EER ( 0.45) and I couldn't understand why the performance is poor (because of frame rate or there is not enough data or there is a problem in the model ) Any thoughts (for data augmentation I don't think adding noise because the model removes the noise in the data preprocessing ) Your help is much appreciated. Thank you.