antiboredom / videogrep

automatic video supercuts with python
https://antiboredom.github.io/videogrep
Other
3.33k stars 257 forks source link

Could you add support for reading other JSON (time data) format in Videogrep? #129

Closed Dean-Corso closed 9 months ago

Dean-Corso commented 9 months ago

Hello guys,

first I want to say thank you for that nice tool I found a while ago. Just have some questions and hope to get some answers.

Over time I was trying to use videogrep to transcribe audio in my videos and to get subtitles out and also to make supercuts. All in all it seems to work good so far but also found some issues where you maybe can help. So you said we need to use VOSK engine and optional we can also use any speech language models from VOSK. I tried this out and it also works good so far but unfortunately I'm not so really happy with the final subtitle results when using any VOSK speech models (sorry to say that). The larger size models working fine no question but I also was trying to check out some other speech models / engine and found a much better one / tool called whisper-standalone-win which includes OpenAI's Whisper & Faster-Whisper. I was testing it and I almost get perfect subtitles out. Just try this out.

Question: When using videogrep I get a subtitle (SRT) and a JSON (all detailed times etc) and somehow I need both files (srt & JSON) to use videogrep except doing the transcribe part when trying to search for anything etc. So what means I always need both files. I do understand that you will need also the JSON file with all time codes for very word when you try to search for fragments but in case of sentence the SRT file alone should be enough or? So there you have the time code for every sentence included. Just asking about it.

So the situation is that I need the SRT & JSON file to work with videogrep to make any cuts I want. Now in case when using a other speech engine & model like whisper-standalone-win I can also output SRT file and JSON too and some more but the problem is that I can not use this JSON file created by whisper-standalone-win with videogrep because they build differently inside. Below a short example....

https://github.com/antiboredom/videogrep/assets/88513086/a660d5d6-f68d-4ed4-8a5f-da9a57ec2a43

SRT by Videogrep using vosk-model-en-us-0.42-gigaspeech ``` 1 00:00:01.140 --> 00:00:03.510 welcome to curl seven not eighty eight 2 00:00:03.510 --> 00:00:05.820 one this is february twenty twenty twenty 3 00:00:05.820 --> 00:00:09.390 three i did the previous release video 4 00:00:09.390 --> 00:00:12.510 just days ago but here we are again ```
JSON by Videogrep ``` [{"content": "welcome to curl seven not eighty eight", "start": 1.14, "end": 3.51, "words": [{"conf": 1.0, "end": 1.62, "start": 1.14, "word": "welcome"}, {"conf": 1.0, "end": 1.8, "start": 1.62, "word": "to"}, {"conf": 1.0, "end": 2.22, "start": 1.83, "word": "curl"}, {"conf": 1.0, "end": 2.7, "start": 2.25, "word": "seven"}, {"conf": 1.0, "end": 2.91, "start": 2.7, "word": "not"}, {"conf": 1.0, "end": 3.15, "start": 2.94, "word": "eighty"}, {"conf": 1.0, "end": 3.51, "start": 3.15, "word": "eight"}]}, {"content": "one this is february twenty twenty twenty", "start": 3.51, "end": 5.82, "words": [{"conf": 1.0, "end": 3.9, "start": 3.51, "word": "one"}, {"conf": 1.0, "end": 4.14, "start": 3.93, "word": "this"}, {"conf": 1.0, "end": 4.23, "start": 4.14, "word": "is"}, {"conf": 1.0, "end": 4.71, "start": 4.23, "word": "february"}, {"conf": 1.0, "end": 5.359878, "start": 4.71, "word": "twenty"}, {"conf": 1.0, "end": 5.64, "start": 5.410122, "word": "twenty"}, {"conf": 1.0, "end": 5.82, "start": 5.64, "word": "twenty"}]}, {"content": "three i did the previous release video", "start": 5.82, "end": 9.39, "words": [{"conf": 1.0, "end": 6.12, "start": 5.82, "word": "three"}, {"conf": 1.0, "end": 6.93, "start": 6.69, "word": "i"}, {"conf": 1.0, "end": 7.23, "start": 6.96, "word": "did"}, {"conf": 1.0, "end": 7.35, "start": 7.23, "word": "the"}, {"conf": 1.0, "end": 7.86, "start": 7.35, "word": "previous"}, {"conf": 1.0, "end": 8.34, "start": 7.86, "word": "release"}, {"conf": 1.0, "end": 9.39, "start": 8.85, "word": "video"}]}, {"content": "just days ago but here we are again", "start": 9.39, "end": 12.51, "words": [{"conf": 1.0, "end": 9.78, "start": 9.39, "word": "just"}, {"conf": 1.0, "end": 10.29, "start": 9.84, "word": "days"}, {"conf": 1.0, "end": 10.71, "start": 10.32, "word": "ago"}, {"conf": 1.0, "end": 10.92, "start": 10.71, "word": "but"}, {"conf": 1.0, "end": 11.43, "start": 10.92, "word": "here"}, {"conf": 1.0, "end": 11.64, "start": 11.43, "word": "we"}, {"conf": 1.0, "end": 11.91, "start": 11.67, "word": "are"}, {"conf": 1.0, "end": 12.51, "start": 11.94, "word": "again"}]}] ```
SRT by whisper-standalone-win faster-whisper-medium model ``` 1 00:00:00,250 --> 00:00:12,410 Welcome to Curl 7.88.1. This is February 20, 2023. I did the previous release video just days ago, but here we are again. ```
JSON by whisper-standalone-win ``` { "segments": [ { "id": 1, "seek": 1256, "start": 0.25, "end": 12.41, "text": " Welcome to Curl 7.88.1. This is February 20, 2023. I did the previous release video just days ago, but here we are again.", "tokens": [ 50364, 4027, 281, 7907, 75, 1614, 13, 16919, 13, 16, 13, 639, 307, 8711, 945, 11, 44377, 13, 286, 630, 264, 3894, 4374, 960, 445, 1708, 2057, 11, 457, 510, 321, 366, 797, 13, 50964 ], "temperature": 0.0, "avg_logprob": -0.2605107095506456, "compression_ratio": 1.1, "no_speech_prob": 0.22200044989585876, "words": [ { "start": 0.25, "end": 1.61, "word": " Welcome", "probability": 0.8963063955307007 }, { "start": 1.61, "end": 1.93, "word": " to", "probability": 0.995418906211853 }, { "start": 1.93, "end": 2.47, "word": " Curl", "probability": 0.6149767711758614 }, { "start": 2.47, "end": 2.79, "word": " 7", "probability": 0.9386059641838074 }, { "start": 2.79, "end": 3.33, "word": ".88", "probability": 0.9816429018974304 }, { "start": 3.33, "end": 3.89, "word": ".1.", "probability": 0.997430831193924 }, { "start": 4.13, "end": 4.13, "word": " This", "probability": 0.9773041605949402 }, { "start": 4.13, "end": 4.31, "word": " is", "probability": 0.9914793372154236 }, { "start": 4.31, "end": 4.75, "word": " February", "probability": 0.9693385362625122 }, { "start": 4.75, "end": 5.43, "word": " 20,", "probability": 0.6420905590057373 }, { "start": 5.53, "end": 6.01, "word": " 2023.", "probability": 0.9931846261024475 }, { "start": 6.77, "end": 7.01, "word": " I", "probability": 0.9944361448287964 }, { "start": 7.01, "end": 7.23, "word": " did", "probability": 0.973505437374115 }, { "start": 7.23, "end": 7.51, "word": " the", "probability": 0.9892951846122742 }, { "start": 7.51, "end": 7.89, "word": " previous", "probability": 0.9932243227958679 }, { "start": 7.89, "end": 8.57, "word": " release", "probability": 0.9570808410644531 }, { "start": 8.57, "end": 9.41, "word": " video", "probability": 0.9355015754699707 }, { "start": 9.41, "end": 9.91, "word": " just", "probability": 0.9749361276626587 }, { "start": 9.91, "end": 10.25, "word": " days", "probability": 0.9939530491828918 }, { "start": 10.25, "end": 10.67, "word": " ago,", "probability": 0.9994761347770691 }, { "start": 10.71, "end": 11.05, "word": " but", "probability": 0.9976321458816528 }, { "start": 11.05, "end": 11.45, "word": " here", "probability": 0.9961627721786499 }, { "start": 11.45, "end": 11.71, "word": " we", "probability": 0.9984422326087952 }, { "start": 11.71, "end": 11.99, "word": " are", "probability": 0.9977152347564697 }, { "start": 11.99, "end": 12.41, "word": " again.", "probability": 0.9830819964408875 } ] } ] } ```

So as you can see the JSON files differ from each other because of using any other format etc and Videogrep can not read it. How to make it work in Videogrep to also accept that JSON time code format? Can you add that format to VG to support that too? Would be nice if you could do that if possible.

Question: When you release a new version could you then possible also release some compiled executable files too (Windows x64)? Always have problem with that.

PS: Or is there maybe a method to format the JSON file from Whisper into JSON write type style format of Videogrep? Anything like that etc? Thank you.

antiboredom commented 9 months ago

Hi! The easiest way to deal with this is to reformat the whisper json file into the format that videogrep is expecting. Then just make sure that that file is named the same as your video file, with the extension as .json rather than .mp4. You don't need an srt as well, just the json file. Something like this python script should do the trick:

import json

WHISPER_FILENAME = "wj.json"
OUTPUT_NAME = "videogrepfile.json"

with open(WHISPER_FILENAME) as f:
    data = json.load(f)

out = []

for s in data["segments"]:
    item = {
        "content": s['test'],
        "start": s['start'],
        "end": s['end'],
        "words": s['words']
    }
    out.append(item)

with open(OUTPUT_NAME, 'w') as f:
    json.dump(out, f)