Edit decoding to force sample a timestamp after every speaker turn

akashmjn / tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

MIT License

421 stars 14 forks source link

Closed akashmjn closed 1 year ago

akashmjn commented 1 year ago

This would likely be a small patch to the logit filtering applied during decoding.

Doing so makes for readable transcripts and sets things up for downstream global diarization (clustering).