A similar question was asked in #78 but it was closed without an answer.
So, on which data is the provided speaker encoder pretrained? I looked through the wiki and issues but couldn't find an answer.
Was it pretrained on a combination of LibriSpeech and VoxCeleb 1 & 2, as mentioned in the thesis? @CorentinJ
In our case, we are taking the pretrained encoder (encoder.pt) and looking to fine-tune its last linear layer and similarity scaling parameters with a dataset of our interest.
Knowing on which data the encoder was pretrained would be of much help.
The training data was the training set from LibriSpeech, VoxCeleb1 Dev A - D and VoxCeleb2, resulting into 3201 hours of data with 8371 different speakers.
A similar question was asked in #78 but it was closed without an answer.
So, on which data is the provided speaker encoder pretrained? I looked through the wiki and issues but couldn't find an answer. Was it pretrained on a combination of LibriSpeech and VoxCeleb 1 & 2, as mentioned in the thesis? @CorentinJ
In our case, we are taking the pretrained encoder (encoder.pt) and looking to fine-tune its last linear layer and similarity scaling parameters with a dataset of our interest.
Knowing on which data the encoder was pretrained would be of much help.