Open patrickvonplaten opened 3 years ago
Hi @patrickvonplaten,
Thanks for your message. Does the load_dataset()
function give the ability to collect an email address from the person loading the data? You might also be interested in the following issue: https://github.com/common-voice/common-voice/issues/3262
Hey @ftyers,
Thanks a lot for your great write-up at https://github.com/common-voice/common-voice/issues/3262 - I very much agree with your points!
@lhoestq @thomwolf - could we maybe provide an optional "email address" field that is required for Common Voice 7?
Hey Common-Voice team!
Thanks a lot for releasing the common voice 7 dataset - it's great to see so many new languages!
At Hugging Face, we have worked a lot with the common voice 6.1 dataset and trained speech models in almost each language of the common voice 6 dataset. In total we open-sourced 240 speech models trained on the common voice dataset, see here.
For the common voice 6.1 dataset it is possible to directly download a language specific dataset via this bundle link:
https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-6.1-2020-12-11/{lang}.tar.gz
This is super convenient and allows us to provide the following simple commands to the community to download and process the dataset:
Do you guys think there is a chance that you could also provide a bundled link for the common voice 7 dataset?
Best, the Datasets team @ Hugging Face