Problem with diarization.

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD 2-Clause "Simplified" License

3.77k stars 331 forks source link

Problem with diarization. #257

Closed Reinmor closed 1 month ago

Reinmor commented 1 month ago

Hi.

First of all thank you for your project! I have adapted your previous version (Q1 2024) and have been using it successfully. There is one problem that I couldn't solve. The main audio language is Russian. However, a lot of technical terminology in English is used.

In such cases there are problems with diarization. A sample audio and the resulting text are available at the link. https://drive.google.com/drive/folders/1pZZffBS-9yMHvViZa4E94rxulh2CGjJe

MahmoudAshraf97 commented 1 month ago

Are you using the default whisper model?

Reinmor commented 1 month ago

Yes, I use large-v3.

MahmoudAshraf97 commented 1 month ago

Generally, diarization itself is independent from transcription so the language or mixture of languages should not be a problem, can you explain with detail what issues are you facing?

Reinmor commented 1 month ago

Instead of the MT7621 processor name, I get text like this: ‘MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT- MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT’. In addition, another part of the phrases is missing at this point. The link in the first post has a sample audio and the resulting text with incorrect diarization. I can send you the correct version of the diarization if you need it.

This error is rare. But it is always related to the fact that the dialogue mentions the names of various components consisting of letters and numbers.

MahmoudAshraf97 commented 1 month ago

The issue with the numbers might be caused by the suppress_numerals option if you have it enabled as for the diarization error, the error is mostly from the model itself so nothing I can do unfortunately, NeMo will release a new model at the end of this month probably, we'll see if it has better performance

Reinmor commented 1 month ago

Hah) You are right. Changing suppress_numerals to false fixes the problem. It remains to be seen if this will worsen the overall results of diarization.

Thank you!