Data used for pretraining speaker encoder?

A similar question was asked in #78 but it was closed without an answer.

So, on which data is the provided speaker encoder pretrained? I looked through the wiki and issues but couldn't find an answer. Was it pretrained on a combination of LibriSpeech and VoxCeleb 1 & 2, as mentioned in the thesis? @CorentinJ

In our case, we are taking the pretrained encoder (encoder.pt) and looking to fine-tune its last linear layer and similarity scaling parameters with a dataset of our interest.

Knowing on which data the encoder was pretrained would be of much help.

CorentinJ / Real-Time-Voice-Cloning

Data used for pretraining speaker encoder? #1032