Closed AmirAliAslani closed 2 years ago
Hi,
after downloading and extracting the CommonVoice Persian dataset zip file from https://commonvoice.mozilla.org/en/datasets, there should be a subfolder fa
in the created folder as below depending on the version XXX that you've downloaded:
cv-corpus-XXX/fa/
You can find the validate.tsv and other files under this directory.
Make sure that you've correctly set the DATASET_PATH
variable in the bash script scripts/preprocess/preprocess_commonvoice_fa.sh
to the full path of the .../fa
directory (which is considered as the dataset root) before running it.
Hello thanks for your guide, but when I unzipped the file I saw this : There is no folder named fa, there is only a file with this name.
Ok, now I see. The downloaded file is supposed to be a .tar
file. Have you tried to extract it with some extraction tool that supports TAR format?
For example 7-Zip: https://www.7-zip.org/
I was making the metadata file (the first step) but it asked me about validate.tsv file. From where I can find it? @hamedhemati