I was quite surprised to see how low the WERs are for the new Common Voice corpus: https://github.com/kaldi-asr/kaldi/blob/master/egs/commonvoice/s5/RESULTS (4ish% TDNN)
Unfortunately, these result…
Right now, we only have 3K sentences or so.
Two possible new sources are
1. Wikimedia text
1. [BYU Corpus](http://corpus.byu.edu/)
1. [Leipzig Corpora](http://wortschatz.uni-leipzig.de/en/download/)
We started to send sentences for the kabyle language but we don't know if these sentences are received somewhere. Any information? should we go on with our corpus?
I’m a speech researcher and I want use common voice speech data for my experiments. Unfortunately there is a big problem with how the corpus (v1) and specifically the train/test/dev split is designed.…
Looks like we're _really_ close to release. There are a few things I'd like to clean up first:
~~I need to find/fix the bug with input that causes lost/hung controller inputs.~~ I got hatemail on…