Jaymon / transcribe

Convert images or audio files to plain text on the command line
MIT License
32 stars 7 forks source link

Add individual word timestamps #4

Open kraftydevil opened 5 years ago

kraftydevil commented 5 years ago

It looks like speech currently returns some timestamp information for what look like sections.

Would you be able to add timestamps for individual words?

Something like this: https://cloud.google.com/speech-to-text/docs/async-time-offsets#speech-async-recognize-gcs-python

Jaymon commented 5 years ago

I think the ideal way to do this would be to add a Block class and a Word class that both extend str, then just have them have have a time property, so the block.time would have the current time of the block (which would be equivalent to current functionality), while for word in block.words: word.time would be able to have the time per word in the block.

Then modify speech.Speech.__iter__ to just return blocks instead of a tuple of time, string and that block would just have a list of words accessibly via Block.words that would return the individual Word instances where you could get the time of that word.

I won't have time to add something like this for a bit, if you want to take a stab at it I'm happy to code review the pull requests and answer any questions

In the meantime, you can get around this by just making a custom script by doing something like this:

s = Speech(path_to_sound_file, lang='en-US')
google_response = s.scan()
for result in google_response.results:
    for word_info in result.alternatives[0].words:
        print(word_info, word_info.start_time, word_info.end_time)

This would give you access to the raw google response.

kraftydevil commented 5 years ago

Sure thing - I'd love to take a stab at it.

Ultimately I want to search for keyword phrases in order to automate editing audio clips.

For example, I will record audio for a list of items, saying something like "Item 1 start", then talk about item 1, and when done, say "Item 1 end".

From there I'd use transcribe to get the timestamps. I could then use another program to create separate clips, one for each item.

...

Hopefully I'll have time to work on this soon. I'll definitely have questions.