Speaker indentification

Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Mozilla Public License 2.0

7.7k stars 665 forks source link

Actually, this application (the const-me inference) has not really anything to do with any of that. What you see is the result of 680.000 hours of training existing subtitles downloaded from the internet and trained to the whisper models, the behaviour in the direction you point out is totally undefined. Speaker identification is not a feature of whisper in any means, again if you see anything pointing in that direction it is pure accident.

If you need defined behaviour for speaker separation you can try the diarize feature of the main.exe example. To identify speakers, you will need a model that has been trained for this purpose, whisper instead has been trained to do general speech-to-text purpose.

Const-me / Whisper

Speaker indentification #83