MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.44k stars 238 forks source link

fix(language): missing language parameter #149

Closed agershun closed 5 months ago

agershun commented 6 months ago

The problem: If the audio has very long rings in the very beginning of the file, it miss the language parameter and try to detect the language automatically.

After the analysis fo the source code I found that you missed the language parameter in the whisper.load_model().

MahmoudAshraf97 commented 6 months ago

Hello and thanks for the contribution, the language parameter is used here: https://github.com/MahmoudAshraf97/whisper-diarization/blob/39572386eb4170fc16440b770666f23ccf9bdc80/transcription_helpers.py#L70 and it has the same effect passing it when loading the model, in both cases it only affects the tokenizer, so this PR has no effect on the final result. Please correct me if I'm missing something

agershun commented 6 months ago

I tried the same file with and without this language parameter in whisperx.load_model().

You can see the difference in the logs: image

image

In case without the parameter it does not take in account the language and tries to detect the language itself.

agershun commented 6 months ago

Or... you mean that this message is the warning only at the model loading time?

PS. Thank you very much for the program!

MahmoudAshraf97 commented 6 months ago

Or... you mean that this message is the warning only at the model loading time?

PS. Thank you very much for the program!

Exactly, adding the language parameter while loading the model only hides this warning, it's useful only when doing inference on multiple audio files which isn't the case here