Since we aren't using the transcribe.py module from Whisper, I was wondering if we have explored obtaining the timestamps for the transcription with the contextual biasing?
For instance if I am looking to produce the following transcription output "<|startoftranscript|><|ja|><|translate|><|0.00|> He has grave doubts whether Sir Frederick Layton's work is really Greek after all and<|6.24|><|6.24|> can discover in it but little of rocky Ithaca.<|9.44|><|endoftext|>"
Since we aren't using the
transcribe.py
module from Whisper, I was wondering if we have explored obtaining the timestamps for the transcription with the contextual biasing? For instance if I am looking to produce the following transcription output "<|startoftranscript|><|ja|><|translate|><|0.00|> He has grave doubts whether Sir Frederick Layton's work is really Greek after all and<|6.24|><|6.24|> can discover in it but little of rocky Ithaca.<|9.44|><|endoftext|>"