akashmjn / tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens
MIT License
421 stars 14 forks source link

Edit decoding to force sample a timestamp after every speaker turn #10

Closed akashmjn closed 1 year ago

akashmjn commented 1 year ago

This would likely be a small patch to the logit filtering applied during decoding.

Doing so makes for readable transcripts and sets things up for downstream global diarization (clustering).