VITA-Group / AutoSpeech

[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang
https://arxiv.org/abs/2005.03215
MIT License
208 stars 42 forks source link

Training data of speaker verification #3

Closed xx205 closed 4 years ago

xx205 commented 4 years ago

To my knowledge, in speaker verification, the speakers in the test data should not appeared in the training data. For the VoxCeleb1 dataset, only the dev set (1211 speakers) are used for training, and the test set (40 speakers) are used for evaluation.

wsstriving commented 4 years ago

It's not True to include the test set in the network training data, do you have the results with 1211 speakers as the training data?

shaojinding commented 4 years ago

Thanks for pointing this out. We are identifying this problem right now, and we will update it soon.

shaojinding commented 4 years ago

Actually, we were following https://github.com/MahdiHajibabaei/unified-embedding/blob/9e8f5f1cf9bca699fa51ed714ea3d2b6e490684b/train_aug.py#L57 when doing data split. It might be a wrong convention since long ago.

czy97 commented 4 years ago

Actually, we were following https://github.com/MahdiHajibabaei/unified-embedding/blob/9e8f5f1cf9bca699fa51ed714ea3d2b6e490684b/train_aug.py#L57 when doing data split. It might be a wrong convention since long ago.

Actually, the pre-processing code you mentioned is correct. However, the format of iden_split.txt (change the speaker name to the id) file in the official website has beed changed. So, you can't use the mentioned script to get the right verification split.

shaojinding commented 4 years ago

Actually, we were following https://github.com/MahdiHajibabaei/unified-embedding/blob/9e8f5f1cf9bca699fa51ed714ea3d2b6e490684b/train_aug.py#L57 when doing data split. It might be a wrong convention since long ago.

Actually, the pre-processing code you mentioned is correct. However, the format of iden_split.txt (change the speaker name to the id) file in the official website has beed changed. So, you can't use the mentioned script to get the right verification split.

I'm afraid not. Correct me if I'm wrong. I know the update of iden_split.txt. If you could take a look at https://raw.githubusercontent.com/cyrta/voxceleb/master/data/v1/Identification_split.txt you will find that all the speakers starting from 'E' are also used for training (basically, verification and identification are using the same training set).

czy97 commented 4 years ago

Actually, we were following https://github.com/MahdiHajibabaei/unified-embedding/blob/9e8f5f1cf9bca699fa51ed714ea3d2b6e490684b/train_aug.py#L57 when doing data split. It might be a wrong convention since long ago.

Actually, the pre-processing code you mentioned is correct. However, the format of iden_split.txt (change the speaker name to the id) file in the official website has beed changed. So, you can't use the mentioned script to get the right verification split.

I'm afraid not. Correct me if I'm wrong. I know the update of iden_split.txt. If you could take a look at https://raw.githubusercontent.com/cyrta/voxceleb/master/data/v1/Identification_split.txt you will find that all the speakers starting from 'E' are also used for training (basically, verification and identification are using the same training set).

The iden_split.txt is correct, but in https://github.com/MahdiHajibabaei/unified-embedding/blob/9e8f5f1cf9bca699fa51ed714ea3d2b6e490684b/train_aug.py#L73, they did put the speakers starting 'E' to the test_set. Speakers starting 'E' is the testset in verification task. Thus, it will be definitely wrong to include them in the training set.

Deus1223 commented 4 years ago

So the model with 1.45% EER is trained with 1251 speakers as the training data?

shaojinding commented 4 years ago

So the model with 1.45% EER is trained with 1251 speakers as the training data?

It was. We have updated the results, as you can find at the updated README.MD