Wav2vec2 makes it possible for low resource languages to build high quality acoustic models using only unlabelled audio. Finetuning this with a couple of hours of labelled data gives you a pretty good start to ASR.
What sort of interface would be required to leverage a wav2vec model (finetuned or not) with MFA instead of Kaldi?
Wav2vec2 makes it possible for low resource languages to build high quality acoustic models using only unlabelled audio. Finetuning this with a couple of hours of labelled data gives you a pretty good start to ASR.
What sort of interface would be required to leverage a wav2vec model (finetuned or not) with MFA instead of Kaldi?