Closed tae-jun closed 5 years ago
Thanks for this, actually, as long as you are not using the voxceleb2 test set, it's OK to use any split.
Can you send me the meta/voxlb2_val.txt? I didn't find that maybe the author deleted it. Looking forward to your reply.
The dataset split file
meta/voxlb2_train.txt
contains audios inmeta/voxlb2_val.txt
. The number of training examples is decreased from1,198,728
to985,290
, when examples in the validation set are removed.I guess people using this repository are suffering from overfitting because of the split error. Please remove the duplicated examples and re-upload the two split files!
The code below is the one that I used to remove the duplicates using Pandas: