NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

CTC Segmentation #2419

Closed sadia95 closed 3 years ago

sadia95 commented 3 years ago

The CTC Segmentation to split the audio files and its transcripts, is not working with German audios. I followed the tutorial on COLAB. Is there any other model to be used for German language?

ekmb commented 3 years ago

@sadia95 you can try pre-trained Quartznet model https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_de_quartznet15x5 Note, correct language name should be added https://github.com/NVIDIA/NeMo/blob/main/tools/ctc_segmentation/scripts/prepare_data.py#L46 for num2words to work.