antiboredom / videogrep

automatic video supercuts with python
https://antiboredom.github.io/videogrep
Other
3.33k stars 257 forks source link

(Enhancement?) Include pre/post silence in word detection #120

Open smithee77 opened 1 year ago

smithee77 commented 1 year ago

Hi, it could be cool if somehow the tool could add the previous silent to the word before it's extracted (in the "segment" option). This can be implemented as an extra option, maybe. Of course this is not an issue, just a suggestion.

As usual, great repo!

antiboredom commented 1 year ago

Great idea! I'll definitely consider adding it (although a bit behind on new features at the moment). There also might be a nice way to make a little extra python script that does this...

cmprmsd commented 8 months ago

I also had this issue and came up with a modified version of @antiboredom's examples.

So this might help you getting what you want. word2["word"] is the word you're looking for.

import sys
from videogrep import parse_transcript, create_supercut

# the min and max duration of silences to extract
min_duration = 0.2
max_duration = 5.0

# value to trim off the end of each clip
adjuster = 0.05

filenames = sys.argv[1:]

words_with_silences = []
for filename in filenames:
    timestamps = parse_transcript(filename)

    # this uses the words, if available
    words = []
    for sentence in timestamps:
        words += sentence['words']

    # for word1, word2 in zip(words[:-2], words[1:]): <- I think this from example skips the last entity
    for word1, word2, word3 in zip(words, words[1:], words[2:]):
        if not word2['word'] == "retirement":
            continue
        first_start = word1['end']
        first_end = word2['start']  # - adjuster
        first_silence = first_end - first_start

        second_start = word2['end']
        second_end = word3['start'] - adjuster
        second_silence = second_end - second_start

        if (min_duration < first_silence < max_duration) and (min_duration < second_silence < max_duration) :
            print(f'The word {word2["word"]} was surrounded by {first_silence} and {second_silence} of silence.')
            words_with_silences.append({'start': first_start, 'end': second_end, 'file': filename})

create_supercut(words_with_silences, 'words_with_silences.mp4')

This yields for example:

The word retirement was surrounded by 2.009999999999309 and 1.8100000000004002 of silence.

and the cut was excellent :smile:

@antiboredom how would I create the splitted files as it's being done with the flag export_clips? edit: nvm -> videogrep.export_individual_clips(words_with_silences, 'words_with_silences.mp4')