NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.95k stars 2.49k forks source link

Speaker Diarization Finetune #2840

Closed aniket7joshi closed 2 years ago

aniket7joshi commented 3 years ago

I ran Speaker_Diarization_Inference.ipynb notebook for getting diarization results on my audio data using the pretrained model. But as my data is quite noisy, the diarization results are not at as expected. Is there any way I can finetune the diarization module for my data? If you can provide a notebook for finetuning the model, it would be really helpful.

nithinraok commented 3 years ago

You can finetune speaker embedding extractor model using script .

What was the pretrained speaker embedding model you used? Suggest to use 'ecapa_tdnn'