Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
6.94k stars 505 forks source link

Word Timestamps #171

Closed Matheusadler closed 5 months ago

Matheusadler commented 5 months ago

Hey there!

When I use the openai/whisper-large-v2 model with the pipeline as follows:

outputs = pipe("filename.wav", chunk_length_s=30, batch_size=16, return_timestamps=True,)

I get the timestamps of each chunk:

{'text': " When you were here before Couldn't look you in the eye You're just like an angel Your skin makes me cry You float like a feather Like a feather in a beautiful world I wish I was special You're so fucking special But I'm a creep (...)", 'chunks': [{'timestamp': (0.0, 27.0), 'text': " When you were here before Couldn't look you in the eye You're just like an angel"}, {'timestamp': (34.24, 41.24), 'text': ' Your skin makes me cry You float like a feather'}, {'timestamp': (47.0, 50.0), 'text': ' Like a feather in a beautiful world'}, {'timestamp': (53.0, 55.0), 'text': ' I wish I was special'}, {'timestamp': (58.0, 60.0), 'text': " You're so fucking special"}, {'timestamp': (62.0, 65.4), 'text': " But I'm a creep"},

In this pipeline, would it be possible to get the timestamps of each word?

mayunchao1994 commented 3 weeks ago

I also encountered the same problem. How did you solve it in the end? https://github.com/Vaibhavs10/insanely-fast-whisper/issues/231