Closed Reinmor closed 1 month ago
Are you using the default whisper model?
Yes, I use large-v3.
Generally, diarization itself is independent from transcription so the language or mixture of languages should not be a problem, can you explain with detail what issues are you facing?
Instead of the MT7621 processor name, I get text like this: ‘MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT- MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT-MT’. In addition, another part of the phrases is missing at this point. The link in the first post has a sample audio and the resulting text with incorrect diarization. I can send you the correct version of the diarization if you need it.
This error is rare. But it is always related to the fact that the dialogue mentions the names of various components consisting of letters and numbers.
The issue with the numbers might be caused by the suppress_numerals
option if you have it enabled
as for the diarization error, the error is mostly from the model itself so nothing I can do unfortunately, NeMo will release a new model at the end of this month probably, we'll see if it has better performance
Hah) You are right. Changing suppress_numerals to false fixes the problem. It remains to be seen if this will worsen the overall results of diarization.
Thank you!
Hi.
First of all thank you for your project! I have adapted your previous version (Q1 2024) and have been using it successfully. There is one problem that I couldn't solve. The main audio language is Russian. However, a lot of technical terminology in English is used.
In such cases there are problems with diarization. A sample audio and the resulting text are available at the link. https://drive.google.com/drive/folders/1pZZffBS-9yMHvViZa4E94rxulh2CGjJe