dangvansam / viet-asr

VietASR - Vietnamese Automatic Speech Recognition
https://github.com/dangvansam98/viet-asr
Apache License 2.0
105 stars 46 forks source link
asr automatic-speech-recognition ctc-decode ctc-loss dangvansam speech-recognition speech-to-text stt viet-tts vietasr vietnamese vietnamese-language vietnamese-nlp vietnamese-speech-recognition viettts

VietASR: An Open-Source Vietnamese Speech to Text



🚀 Some experiment with NeMo, ASR use QuartzNet model is a smaller version of Jaser model.

The pretrained model on this repo was trained with ~100 hours Vietnamese speech dataset, was collected from youtube, radio, call center(8k), text to speech data and some public dataset (vlsp, vivos, fpt). It is very small model (13M parameters) make it inference so fast âš¡

🌱 Update: The new version available on branch v2.0 is built from scratch with PyTorch

🌱 For Text to Speech, visit VietTTS repo

Installation

Video demo

TODO

Citation

  @article{kuchaiev2019nemo,
    title={Nemo: a toolkit for building ai applications using neural modules},
    author={Kuchaiev, Oleksii and Li, Jason and Nguyen, Huyen and Hrinchuk, Oleksii and Leary, Ryan and Ginsburg, Boris and Kriman, Samuel and Beliaev, Stanislav and Lavrukhin, Vitaly and Cook, Jack and others},
    journal={arXiv preprint arXiv:1909.09577},
    year={2019}
  }