ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.38k stars 3.61k forks source link

When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step #2515

Closed jettoblack closed 2 days ago

jettoblack commented 5 days ago

I'm using the whisper.net wrapper for .NET which uses the new_segment_callback to get all results from whisper.cpp. I'm extending it to support enabling DTW timestamps which I find to be useful. However, new_segment_callback gets called on segments before DTW timestamps are computed, so I can't get the DTW timestamps this way.

This commit changes it so that when dtw_token_timestamps is true, new_segment_callback are deferred until after DTW timestamps are computed. This change has no effect except when dtw_token_timestamps is true AND new_segment_callback is not null.

jettoblack commented 3 days ago

I wonder if it makes sense to move the new_segment_callback at the end of the decode loop, regardless if DTW is on or off. What do you think - do you see any problems with it?

I considered that but I wanted to limit the scope of the change just in case. It might increase latency when doing streaming transcription with a small window? I'm not sure if anyone would notice..

ggerganov commented 2 days ago

It might increase latency when doing streaming transcription with a small window?

Hm, yes. I guess it is fine as proposed. Thanks 👍