Open LozramA opened 1 year ago
Hey @LozramA, I don't know your workflow, but, if you have validated.tsv file, you could actually merge it with the v11.0 validated.tsv and use CorporaCreator to generate a new train/dev/test set.
Thanks @HarikalarKutusu but on german V11 segment are missing all tsv files. so this is completely unusable. I used CV12 full but thats getting now all too big for privatly available computer/GPU power. Was running many days to train and not really many epochs ( RTX 2060 Intel i7 )
Yes, if you not already have v11 in full you need to download it, unfortunately... And I know, it is a painful process.
Btw, if you already have the mp3 files, I can share full .tsv files with you for any version... I had to extract them for the cv-dataset-analyzer project I implemented.
And secondly, the correct repo for the issue is https://github.com/common-voice/common-voice-bundler
The newest german segment "cv-corpus-12.0-delta-2022-12-07-de.tar" does not include the train.tsv dev.tsv and test.tsv.