Can I Fine-Tune the Diarization Model to Recognize a Specific Individual's Voice?

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD 2-Clause "Simplified" License

3.75k stars 329 forks source link

Can I Fine-Tune the Diarization Model to Recognize a Specific Individual's Voice? #232

Closed shivamtawari closed 1 month ago

shivamtawari commented 1 month ago

Hi @MahmoudAshraf97

I'm curious to know if it's possible to customize the diarization output. Specifically, can we assign a custom name, such as 'Mr. XYZ', to dialogues spoken by a particular person, while the rest are labeled as 'Person 0', 'Person 1', etc.?

Thanks!

MahmoudAshraf97 commented 1 month ago

It's doable but not through finetuning, you will use the intermediate embeddings generated from MSDD model and compare them to reference embeddings that you generated to identify which speaker is XYZ