clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.06k stars 273 forks source link

Question regarding dataset downloading #149

Closed itsmag11 closed 2 years ago

itsmag11 commented 2 years ago

Dear authors,

I am new to this VoxCeleb dataset and have trouble downloading the videos from the scripts given. I follow the instructions in the Data Preparation section but only get the audio part of the dataset. May I ask how I can download the videos (in .mp4 or .m4v format maybe) containing both visual and audio parts? Thanks in advance.

Regards

padiarushi3012 commented 2 years ago

Hey @itsmag11

I found this helpful - https://github.com/kaneelgit/msi_voxceleb/blob/main/video_classification_2d.ipynb

itsmag11 commented 2 years ago

Thanks @padiarushi3012 !

I found the test set downloading command from the post you gave. For anyone who also have the same problem, I post the command here for easy reference: !wget "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_test_mp4.zip"

Moreover, I found the dataset page provided by the author which is much clearer than that hosted on the https://www.robots.ox.ac.uk/ website. The link is here. Everything related to downloading the dataset should be clear on this page.

Then I'll close the issue.

Mercurise commented 2 years ago

hi @itsmag11 , can I ask you a question regarding downloading the dataset? It seems the host link you shared is a mirror of Voxceleb (probably early version?).

My question is, do you still need to fill out a form to get the username and password to download? Because I noticed the current Vox page (https://www.robots.ox.ac.uk/~vgg/data/voxceleb/) didn't have a link for the form (https://docs.google.com/forms/d/e/1FAIpQLSdQhpq2Be2CktaPhuadUMU7ZDJoQuRlFlzNO45xO-drWQ0AXA/viewform?fbzx=7440236747203254000). I tried to follow the data_prep.py but got the 404 issues:

python ./dataprep.py --save_path ../dataset_dir/ --download --user USERNAME --password PASSWORD

--2022-05-06 09:49:14--  http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa
Resolving www.robots.ox.ac.uk (www.robots.ox.ac.uk)... 129.67.94.2
Connecting to www.robots.ox.ac.uk (www.robots.ox.ac.uk)|129.67.94.2|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa [following]
--2022-05-06 09:49:14--  https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa
Connecting to www.robots.ox.ac.uk (www.robots.ox.ac.uk)|129.67.94.2|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-05-06 09:49:14 ERROR 404: Not Found.

Traceback (most recent call last):
  File "./dataprep.py", line 176, in <module>
    download(args,fileparts)
  File "./dataprep.py", line 58, in download
    raise ValueError('Download failed %s. If download fails repeatedly, use alternate URL on the VoxCeleb website.'%url)
ValueError: Download failed http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa. If download fails repeatedly, use alternate URL on the VoxCeleb website.
itsmag11 commented 2 years ago

Hi @Mercurise,

For me, I found the additional host link useful as it provides the names of not only the .wav files (audio part) but also the .mp4 files (video part), which I did not found in the current host.

And yes, you still need to fill the request form to get the username and password.

So if you just want the audio part, I think replacing the USERNAME and PASSWORD in the command you use will do. If you also want the .mp4 files, you need to change the lists/fileparts.txt used by data_prep.py to mp4 file parts and their corresponding MD5 Checksum.

Hope it helps.

Mercurise commented 2 years ago

Hi @itsmag11 ,

Thanks for the response, and I've fetched the username and password via the form. Turns out they are sending the exact same username and password pairs (I got the same pair as this thread posted in Apr 2021 https://github.com/clovaai/voxceleb_trainer/issues/117#issue-933475286). My retrying prep_data.py still gave me the 404 issues, looks like their internal issues on the dataset maintenance.

Regarding the mirror page you shared, it looks much clear and well-organised. Big thanks! I'm playing with the verification data and the data can be downloaded with the given username and password pair. I'll probably manually download my interested data.

Again, thanks a lot. Have a nice day.

itsmag11 commented 2 years ago

@Mercurise Not a problem.

And for the downloading issue, I just thought of that I have changed all urls of files in lists/fileparts.txt from http://www.robots.ox.ac.uk/ to http://thor.robots.ox.ac.uk/ following someone's solution somewhere. Hope it solves your problem.