NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.48k stars 2.4k forks source link

How to adapt myself speaker model into the diarization pipeline? #9630

Closed SoundingSilence closed 2 weeks ago

SoundingSilence commented 2 months ago

Thank you for your contribution for the Nemo framework, it is an awesome work. I use the Diarization pipeline, which is vad -> spk_emb -> clustering(nmesc+kmeans). When I use myself speaker model, I found the diarization performance is even worse than the open-source titanet-large model on my own test dataset. Note that in the aspect of the speaker recognition performance, my speaker model is better than the provided titanet-large on my own test set. However, the diarization equipped with my speaker model cannot get good performance (better eer can not get better der?). Is there any hyper-parameters in clustering need to be tuned for different speaker model and test set (such as the self.nme_mat_size, self.max_rp_thershold, self.sparse_search_volume, etc.) Hope for your valuable suggestion, thanks !

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 7 days since being marked as stale.