crisishistory / HistoryAIToolkit

AI toolkit for professional and amateur oral historians
GNU General Public License v3.0
9 stars 18 forks source link

Experiment with adding speaker diarization #45

Open audreyfeldroy opened 9 months ago

audreyfeldroy commented 9 months ago

Speaker diarization is where you annotate a transcript by noting which words were spoken by which speakers.

There are tools in Python that do this. It would be great to try them out and see if any would work for our project:

It's possible we may also have to implement our own speaker diarization, either here or in a separate repo that we use as a dependency here. I attended a talk last night about how News UK did this with their own dynamic clustering of their vectorized embeddings. They used the large whisper model to transcribe their audio files, and then they implemented speaker diarization using their own algorithm. I vaguely recall they used https://github.com/NVIDIA/NeMo for the auto-clustering.

Contributions welcome from anyone who wants to play with this!

heymanpreet commented 9 months ago

@audreyfeldroy Happy to experiment with this ticket if anyone not working on it. Thanks.

audreyfeldroy commented 9 months ago

This one is open for anyone looking for an issue to work on 🙂

Subramaniam-dot commented 8 months ago

Can I take up this issue?