Experiment with adding speaker diarization

audreyfeldroy commented 9 months ago

Speaker diarization is where you annotate a transcript by noting which words were spoken by which speakers.

There are tools in Python that do this. It would be great to try them out and see if any would work for our project:

https://github.com/pyannote/pyannote-audio
https://github.com/espnet/espnet
Anything else anyone can find!

It's possible we may also have to implement our own speaker diarization, either here or in a separate repo that we use as a dependency here. I attended a talk last night about how News UK did this with their own dynamic clustering of their vectorized embeddings. They used the large whisper model to transcribe their audio files, and then they implemented speaker diarization using their own algorithm. I vaguely recall they used https://github.com/NVIDIA/NeMo for the auto-clustering.

Contributions welcome from anyone who wants to play with this!

heymanpreet commented 9 months ago

@audreyfeldroy Happy to experiment with this ticket if anyone not working on it. Thanks.

audreyfeldroy commented 9 months ago

This one is open for anyone looking for an issue to work on 🙂

Subramaniam-dot commented 8 months ago

Can I take up this issue?

crisishistory / HistoryAIToolkit

Experiment with adding speaker diarization #45