clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

Download script has some error? #124

Closed ali2iptoki closed 2 years ago

ali2iptoki commented 3 years ago

I tried to follow your script to download the dataset. Usually there is a problem to connect to the server to download the whole dataset. I tried to download part by part. So I start By loading all the vox1:

http://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa e395d020928bc15670b570a21695ed96
http://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab bbfaaccefab65d82b21903e81a8a8020
http://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac 017d579a2a96a077f40042ec33e51512
http://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad 7bb1e9f70fddc7a678fa998ea8b3ba19
http://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip 185fdc63c3c739954633d50379a3d102

Then I tried to execute the command concerning the extraction but just i leave in the file the following line: vox1_dev_wav_parta* vox1_dev_wav.zip ae63e55b951748cc486645f532ba230b

So I can see now in my directory:

  1. voxceleb1
  2. vox1_dev_wav.zip
  3. vox1_test_wav.zip

But also I can see: in the terminal the following:

Checksum successful vox1_test_wav.zip.

(aliEnv) ubuntu@ip:~/repos/algo/app$ python /home/ubuntu/repos/algo/app/aliSrc/src/load_voxceleb_dataset.py --save_path "/home/ubuntu/repos/algo/app/aliSrc/dataset/big" --list_path /home/ubuntu/repos/algo/app/aliSrc/dataset/lists --extract
Checksum successful vox1_dev_wav.zip.
Extracting /home/ubuntu/repos/algo/app/aliSrc/dataset/big/vox1_dev_wav.zip
mv: cannot stat '/home/ubuntu/repos/algo/app/aliSrc/dataset/big/dev/aac/*': No such file or directory
mv: cannot stat '/home/ubuntu/repos/algo/app/aliSrc/dataset/big/aac': No such file or directory

Can I get what is wrong? Does some files are missing or what exactly?

Mercurise commented 2 years ago

Hi @ali2iptoki, I have the same issue as well. Have you solved it? Or are there any other upcoming problems after this?

I guess we should create that folder before running the extracting codes. The bad thing is the dataprep script will delete (via rm) the downloaded fragment files before mv, which makes it impossible to run the experiment twice and have to re-download and extract again. I also spotted the issue while calling dataprep to download, the destination path should not contain slash / or it will fail as well...

Jungjee commented 2 years ago

We have the dataset now in https://mm.kaist.ac.kr/datasets/voxceleb/#downloads . We may prepare a new download script in the meanwhile, but we don't have one at the moment.

uioo1 commented 2 years ago

We have the dataset now in https://mm.kaist.ac.kr/datasets/voxceleb/#downloads . We may prepare a new download script in the meanwhile, but we don't have one at the moment.

You are my HERO!

gancx commented 1 year ago

I noticed that a new fileparts.txt has been prepared already so that the data can be downloaded from http://cnode01.mm.kaist.ac.kr. However, I still encountered this error as below.

mv: target './data/aac/' is not a directory mv: cannot stat './data/aac': No such file or directory

Besides, I don't find train_list.txt & test_list.txt which are used for training in the downloaded files. Appreciate your help if any.