BlueBash / Autogen_Video_Refinement

An application designed to condense lengthy videos into concise, informative clips. Ideal for editors who need to efficiently sift through hours of footage to create compelling short films or highlight reels.
37 stars 1 forks source link

What model does this repo use? #1

Open wong251440 opened 7 months ago

wong251440 commented 7 months ago

What model does this repo use to condense the original video transcript, and then create a short script??

vishal-bluebash commented 7 months ago

We are using gpt-4-turbo model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)

wong251440 commented 7 months ago

We are using gpt-4-turbo model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)

Okay, I'm quite curious about this. Isn't the output of GPT an abstractive summary? Shouldn't the sentences in the short transcript it generates be different from the original transcript?

Then how do you use the generated output to allign back to the original video timestamps and then synthesize the result video?

prince-bluebash commented 7 months ago

Hi @wong251440

To extract transcription from video we use Whisper AI model and its one of the most advanced open source model for the task(SpeechToText).Its also provides us the exact timestamps of the utterance's from video.

We use python libraries moviepy and opencv for video editing using the processed transcript by gpt-4-turbo(llm).

We are using gpt-4-turbo model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)

Okay, I'm quite curious about this. Isn't the output of GPT an abstractive summary? Shouldn't the sentences in the short transcript it generates be different from the original transcript?

Then how do you use the generated output to allign back to the original video timestamps and then synthesize the result video?