clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

How to generate trial_pair_Verification.txt file if I would like to use own datasets. #70

Closed ASR2020Guru closed 3 years ago

ASR2020Guru commented 3 years ago

Hi @dvisockas @joonson,

Thanks for sharing the codes. Great work.

I would like to ask how should I prepare to generate trial_pair_Verification.txt file if I would like to use own datasets?

Take the veri_test.txt for voxceleb1 for example: It contains 37720 rows. 40 speakers from voxcelceb1 test set. 4874 sentences in total. each speaker has various sentences as below : {'id10300': '304', 'id10301': '48', 'id10302': '166', 'id10303': '103', 'id10304': '162', 'id10305': '137', 'id10306': '184', 'id10307': '156', 'id10308': '64', 'id10309': '165', 'id10281': '84', 'id10280': '67', 'id10283': '233', 'id10282': '84', 'id10285': '93', 'id10284': '90', 'id10287': '48', 'id10286': '149', 'id10289': '87', 'id10288': '48', 'id10270': '158', 'id10271': '73', 'id10272': '50', 'id10273': '240', 'id10274': '54', 'id10275': '74', 'id10276': '185', 'id10277': '67', 'id10278': '187', 'id10279': '63', 'id10298': '127', 'id10299': '49', 'id10296': '98', 'id10297': '79', 'id10294': '138', 'id10295': '88', 'id10292': '265', 'id10293': '194', 'id10290': '137', 'id10291': '76'} each speaker has various trails as below: {'id10300': '2080', 'id10301': '336', 'id10302': '1328', 'id10303': '544', 'id10304': '1288', 'id10305': '1088', 'id10306': '1472', 'id10307': '1248', 'id10308': '496', 'id10309': '1320', 'id10281': '648', 'id10280': '496', 'id10283': '1864', 'id10282': '672', 'id10285': '728', 'id10284': '672', 'id10287': '384', 'id10286': '1184', 'id10289': '664', 'id10288': '368', 'id10270': '1120', 'id10271': '568', 'id10272': '392', 'id10273': '1920', 'id10274': '400', 'id10275': '568', 'id10276': '1480', 'id10277': '512', 'id10278': '1480', 'id10279': '488', 'id10298': '1016', 'id10299': '392', 'id10296': '784', 'id10297': '624', 'id10294': '1088', 'id10295': '704', 'id10292': '2080', 'id10293': '1544', 'id10290': '1072', 'id10291': '608'}

How is the these trails organised?

I would like to use the same strategy to prepare my own test dataset.

Thanks

dvisockas commented 3 years ago

Hey @ASR2020Guru , great username! :laughing:

The sample txt file can he found here

For this repository code to work without changing it, the test file needs to be conducted in such a manner that each row consists of three whitespace separated values: are two files spoken by the same speaker (1 or 0), first file path, second file path.

An example:

1 speaker_one/hello.wav speaker_one/goodbye.wav
0 speaker_one/intro.wav speaker_two/hello.wav

Note that you don't need to provide an absolute path as --test_path argument will add a base path to all provided speaker test pairs.

Does that answer your question?

ASR2020Guru commented 3 years ago

Hi @dvisockas ,

Thanks for your quick reply. I am new to GitHub :)

I would like to know how to split the data for this trial_pair_Verification.txt file.

Correct me if I was wrong:

As indicated as the trial_pair_Verification.txt file which downloaded from robots.ox.ac.uk

Is the data pair of this trial_pair_Verification.txt file comes from the test folder of VoxCelceb1 (all 40 speakers)?

As I posted in the above thread, take speaker id10270 as an example:

  1. in the the test folder of VoxCelceb1, speaker id10270 has 158 sentence. in the trial_pair_Verification.txt file, speaker id10300 has 1120 rows.

  2. if we look closer in the trial_pair_Verification.txt file,

1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00008.wav
0 id10270/x6uYqmx31kE/00001.wav id10300/ize_eiCFEg0/00003.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/GWXujl-xAVM/00017.wav
0 id10270/x6uYqmx31kE/00001.wav id10273/0OCW1HUxZyg/00001.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00022.wav
0 id10270/x6uYqmx31kE/00001.wav id10284/Uzxv7Axh3Z8/00001.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/GWXujl-xAVM/00033.wav
0 id10270/x6uYqmx31kE/00001.wav id10284/7yx9A0yzLYk/00029.wav
  1. we can see each sentence of speaker id10270 has 8 trail pair rows, 4 of them are from same speaker , other 4 are random choosed from other speakers.

  2. So is that 158 sentence of speaker id10270 should has 158*8=1264 rows(not 1120 rows)?

  3. As I posted in the above thread, in this trail file, each speaker has about (7.77(each speaker's sentences numbers)) rows, not like (8(each speaker's sentences numbers)) rows. It makes me confused.

I would like to prepare my own dataset for test, so I have to create a trail file. I would like to follow Voxceleb's trail file stratege. But I found that I cannot understand how it split/choose data to create such a trail file.

Thank you in advance.

Cheers

dvisockas commented 3 years ago

One of the reasons could be that not all the speakers have the same amount of files, hence if you would use all the files of each of the speakers you would have an imbalanced test set. If you want to understand the trial file creation I believe you would be better off contacting the VGG group :slightly_smiling_face:

ASR2020Guru commented 3 years ago

Good point 👍 Thanks