Open wong251440 opened 7 months ago
We are using gpt-4-turbo
model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)
We are using
gpt-4-turbo
model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)
Okay, I'm quite curious about this. Isn't the output of GPT an abstractive summary? Shouldn't the sentences in the short transcript it generates be different from the original transcript?
Then how do you use the generated output to allign back to the original video timestamps and then synthesize the result video?
Hi @wong251440
To extract transcription from video we use Whisper AI model and its one of the most advanced open source model for the task(SpeechToText).Its also provides us the exact timestamps of the utterance's from video.
We use python libraries moviepy and opencv for video editing using the processed transcript by gpt-4-turbo(llm).
We are using
gpt-4-turbo
model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)Okay, I'm quite curious about this. Isn't the output of GPT an abstractive summary? Shouldn't the sentences in the short transcript it generates be different from the original transcript?
Then how do you use the generated output to allign back to the original video timestamps and then synthesize the result video?
What model does this repo use to condense the original video transcript, and then create a short script??