Closed eschmidbauer closed 1 year ago
Hi,
Word-level timestamps are currently not possible. They usually require extensions to the model that are not implemented at this time.
Thank you for the amazing work on this! It would be amazing if world level timestamps could be implemented in faster-whisper, once the world-level-timestamps branch is merged to main in whisper
Just checked out the whisper repo and world-level timestamp PR has been merged. I would be great indeed to have the same on faster-whiper.
Great work!
I just pushed an experimental branch implementing word-level timestamps! It would be great if you can test this early.
Note that I implemented exactly the same logic as openai/whisper. So if there is a strange result and openai/whisper has the same result, you should report the issue to openai/whisper and not here.
Here's how you can test this today:
pip install --force-reinstall "faster-whisper[conversion] @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/word-level-timestamps.tar.gz"
pip install --force-reinstall ctranslate2-3.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
The model should be converted again with the latest version of CTranslate2 as the configuration needs to be updated with additional information:
ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 --copy_files tokenizer.json --quantization float16
segments, _ = model.transcribe(audio_path, word_timestamps=True)
for segment in segments:
print(segment.words)
just tested this with the tiny model and it worked! going to do more tests but this is great, thanks so much for sharing!
large-v2 seems to work too. Thanks again
When I tested word timestamps on a bunch of file, I saw this error happening on some corner case:
File "/usr/local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 531, in add_word_timestamps
alignment = self.find_alignment(tokenizer, text_tokens, mel, num_frames)
File "/usr/local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 598, in find_alignment
start_times = jump_times[word_boundaries[:-1]]
IndexError: index 1 is out of bounds for axis 0 with size 1
Thank you for testing!
Do you confirm the same file works without issue in openai/whisper? If yes, is it possible for you to share this input file?
@guillaumekln First of all, this is very nice!
I have a quick question about the probabilities. Does it indicate how likely it is that this word was spoken, or how likely it is that this word was spoken at the this time in the segment?
I got to this point: https://github.com/SYSTRAN/faster-whisper/blob/d57c5b40b06e59ec44240d93485a95799548af50/faster_whisper/transcribe.py#L1733
Which calls align from CTranslate2
So I think the word_probabilities indicate how likely it is that the word was spoken at the specific time in the segment.
Have you any Idea how to get the probability of how likely it is that a specific word (not considering its timing) was spoken?
Hi, I really appreciate you sharing this implementation. I found it to be very fast with accurate results. I do not see word-level timestamps in the result. Are word level timestamps possible?