{'text': " When you were here before Couldn't look you in the eye You're just like an angel Your skin makes me cry You float like a feather Like a feather in a beautiful world I wish I was special You're so fucking special But I'm a creep (...)",
'chunks': [{'timestamp': (0.0, 27.0),
'text': " When you were here before Couldn't look you in the eye You're just like an angel"},
{'timestamp': (34.24, 41.24),
'text': ' Your skin makes me cry You float like a feather'},
{'timestamp': (47.0, 50.0), 'text': ' Like a feather in a beautiful world'},
{'timestamp': (53.0, 55.0), 'text': ' I wish I was special'},
{'timestamp': (58.0, 60.0), 'text': " You're so fucking special"},
{'timestamp': (62.0, 65.4), 'text': " But I'm a creep"},
In this pipeline, would it be possible to get the timestamps of each word?
Hey there!
When I use the openai/whisper-large-v2 model with the pipeline as follows:
outputs = pipe("filename.wav", chunk_length_s=30, batch_size=16, return_timestamps=True,)
I get the timestamps of each chunk:
{'text': " When you were here before Couldn't look you in the eye You're just like an angel Your skin makes me cry You float like a feather Like a feather in a beautiful world I wish I was special You're so fucking special But I'm a creep (...)", 'chunks': [{'timestamp': (0.0, 27.0), 'text': " When you were here before Couldn't look you in the eye You're just like an angel"}, {'timestamp': (34.24, 41.24), 'text': ' Your skin makes me cry You float like a feather'}, {'timestamp': (47.0, 50.0), 'text': ' Like a feather in a beautiful world'}, {'timestamp': (53.0, 55.0), 'text': ' I wish I was special'}, {'timestamp': (58.0, 60.0), 'text': " You're so fucking special"}, {'timestamp': (62.0, 65.4), 'text': " But I'm a creep"},
In this pipeline, would it be possible to get the timestamps of each word?