argmaxinc / WhisperKit

On-device Inference of Whisper Speech Recognition Models for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
2.89k stars 241 forks source link

Speaker Diarization #31

Open fakerybakery opened 4 months ago

fakerybakery commented 4 months ago

Hi, Is speaker diarization planned (espec. in realtime)? Thx!

ZachNagengast commented 4 months ago

For now we're mainly focused on running the core whisper models, which don't support diarization by default, but if you're up for building a library for this we'd be happy to point to your project.

Seems like there's a few models that have this capability at the moment, here is the most popular thread in the openai/whisper repo on the subject: https://github.com/openai/whisper/discussions/264. We will do our best to stay in parity with them if they pick a specific approach to diarization, but in the meantime there is an open opportunity for another project to bring it to swift. Will keep this issue open until then.