dataset problem - Githubissues

yaoyao1206 commented 3 years ago

Hi, thanks for this great work. I met a problem when I used the dataprep.py to download the Voxceleb data. When I run this python file, it got error as below: root@43d48c70b38a:/data/voxceleb_trainer# python ./dataprep.py --save_path data --download --user voxceleb1912 --password 0s42xuw6 /bin/sh: 1: wget: not found Traceback (most recent call last): File "./dataprep.py", line 176, in download(args,fileparts) File "./dataprep.py", line 58, in download raise ValueError('Download failed %s. If download fails repeatedly, use alternate URL on the VoxCeleb website.'%url) ValueError: Download failed https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa. If download fails repeatedly, use alternate URL on the VoxCeleb website. Could you please help me with it? Thank you!

ukemamaster commented 3 years ago

The error clearly says : wget: not found, and the README.md clearly says: In addition to the Python dependencies, wget and ffmpeg must be installed on the system. So i think you need to install wget.

Run the following command to install it on Ubuntu. sudo apt-get install wget

yaoyao1206 commented 3 years ago

thank you！I have the data set ready but I have a new problem running the program RuntimeError: Error opening 'data/voxceleb2/aac/id07466/j7xvwDktsyk/00252.wav': File contains data in an unknown format. I have converted all the audio to.wav format and checked that this audio can play normally.It seems that there is an error when extracting tensor from. WAV. Could you please tell me how to solve it? Looking forward to your reply

ukemamaster commented 3 years ago

How did you convert to .wav?
Does it happen with all wav files? or only with specific ones?
What is the output of ffmpeg -i data/voxceleb2/aac/id07466/j7xvwDktsyk/00252.wav ?
And how do you load your data? with librosa or soundfile or scipy.io.wavfile ?

Jungjee commented 2 years ago

Closing this issue as it has been inactive for more than six months.

clovaai / voxceleb_trainer

dataset problem #117