Closed severin-lemaignan closed 1 year ago
You'd better ask for specific details if something is not clear for you.
Commonvoice has tiny amount of Norwegian, you'd better use https://www.nb.no/sprakbanken/
You can use existing data preparation scripts from kaldi repo https://github.com/kaldi-asr/kaldi/blob/master/egs/sprakbanken/s5/run.sh
Thanks a lot for the links. I'll have a look and come back to you with more specific questions if needed.
Hi,
I would like to train a new language model (for Norwegian). According to the website, I can follow these steps: https://github.com/alphacep/vosk-api/tree/master/training However, the documentation is 'TBD'. Any chance you could provide some more details, starting for instance from a Mozilla CommonVoice dataset?
Thanks!