argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Is it possible to add a TranscriptionSegment callback? #159

Open Josscii opened 5 months ago

Josscii commented 5 months ago

during file transcribe, it is more convenient if we can get callback of a TranscriptionSegment, the TranscriptionProgress is not that helpful.

ldenoue commented 5 months ago

It would be very useful to get TranscriptionSegment indeed, with the word timestamps when available. Currently the TranscriptionProgress text contains raw strings such as <|startoftranscript|><|pl|><|transcribe|><|0.00|> Jeżeli zastanawiajcie się which isn't easy to parse.

ZachNagengast commented 5 months ago

We're currently not building the segments before a window completes, but it may be possible to have it return when we see two timestamp tokens surrounding text come through. Would you prefer a separate callback for this, or a configurable parameter on the existing callback eg. callbackInterval: .token or callbackInterval: .segment?

Josscii commented 5 months ago

What's the relationship of the TranscriptionProgress callback and the TranscriptionSegment callback?

Will they callback at the same time? If so, may be merged into one. If not, may be separated.

ldenoue commented 5 months ago

I would prefer a separate callback that returns TranscriptionSegment structs.