CN-celeb2 - Githubissues

MiukkaZh / MGT

Learning Domain-Invariant Transformation for Speaker Verification.

8 stars 0 forks source link

CN-celeb2 #2

Open wangsheng3 opened 1 year ago

wangsheng3 commented 1 year ago

I saw that the first two parts of CN celeb2 data were used in the paper, but I can only decompress the entire data and cannot extract the first two parts. May I ask how you did it?

MiukkaZh commented 1 year ago

Due to a data version update, you will need to update some of the code. Please verify the CN-Celeb data details in http://www.[openslr.org](http://www.openslr.org/82/)/82/. Before preprocessing, it contains a total of 3000 speakers, with 2800 speakers used for training and 200 speakers used for testing. Please obtain the ids of these 200 speakers from files such as trial.lst for testing.

wangsheng3 commented 1 year ago

You may have misunderstood my meaning, this website（ http://www.openslr.org/82 ）The dataset of CN celeb2 in is divided into three parts and cannot be decompressed separately (it is also possible that I do not decompress separately). However, if we decompress them together, the data from the three parts will mix together. I don't know how to separate and obtain the dataset from the first two parts (because you are using the first two parts).

MiukkaZh commented 1 year ago

Part-1 and Part-2 are the previous data partitions of CN-Celeb. In fact, we are using the entire CN-Celeb dataset, so there is no need for data segmentation.