antiboredom / videogrep

automatic video supercuts with python
https://antiboredom.github.io/videogrep
Other
3.32k stars 257 forks source link

Automatically refine word-level alignments from sentence-level alignments #106

Open ryanfb opened 2 years ago

ryanfb commented 2 years ago

First of all, thanks so much for all your work on this and making it open source! It would be cool if it were possible to do a fragment search using an existing SRT transcription without having to re-transcribe all of the audio in advance. One way to do this would be to use the existing sentence-level alignments to extract the audio ranges for sentences that match a search, then use vosk to transcribe just those audio ranges, then use the results of those transcriptions to extract the fragment-level audio.

antiboredom commented 2 years ago

That's an interesting idea - I'd definitely be open to experimenting with it... Alignment might also work here. https://github.com/alphacep/vosk-api/pull/756

cmprmsd commented 6 months ago

It would also be beneficial to rely on the words sourced from the subtitle file. That way the detection quality could be improved a lot, right? I tried to implement @ryanfb's suggestion back in 2021 with pocketsphinx but the results weren't promising. :cry: