dusty-nv / jetson-voice

ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT
184 stars 47 forks source link

Support for other languages #5

Open 3DVERSEjn opened 3 years ago

3DVERSEjn commented 3 years ago

Is there a way you can add or give instructions on how to adapt another language for the asr for instance spanish.

mistyk commented 1 year ago

Same here for french..

kurkovpavel commented 1 year ago
  1. Export your trained model for your language (nemo file) to onnx file (look at Nemo git)
  2. create folder like jetson-voice/data/networks/asr/quartznet-15x5_fr
  3. put next files to quartznet-15x5_fr:
    • quartznet-15x5_fr.onnx
    • quartznet.json
    • quartznet.beamsearch.json
    • quartznet.beamsearch_lm.json
    • quartznet.greedy.json
    • quartznet.json
    • Option: prepare beamsearch language model and save as lm.bin in quartznet-15x5_fr folder
  4. Define your vocabulary in all json files
  5. define ctc_decoder type in quartznet.json like beamsearch or greedy
  6. define new model in jetson-voice/data/networks/manifest.json like this (add more one):
  7. "quartznet-15x5_fr": { "alias": ["quartznet_fr", "quartznet_fr"], "domain": "asr", "url": "", "config": "quartznet.json", "description": "QuartzNet-15x5 ASR FR model using CTC beamsearch or greedydecoder with language model." }

  8. run: examples/asr.py --wav data/audio/dusty.wav --model quartznet_fr
  9. if you dont have beamsearch language model lm.bin choose ctc_decoder type in quartznet.json as greedy