M5 - BLIP and T5 model integration

JuanS286 / gif_auto_audio_description

A project that aims to generate the audio description of GIF's for visually impaired people

2 stars 2 forks source link

M5 - BLIP and T5 model integration #27

Closed JuanS286 closed 2 weeks ago

JuanS286 commented 2 weeks ago

Create a model that trains both T5 and BLIP, the last one with backpropagation of the T5 cost function

JuanS286 commented 2 weeks ago

A model was defined to get a set of frames from the GIF's, give them to BLIP model which is going to generate one caption for each frame. Next, the group of captions related to every single gif are going to T5 to aggregate them and generate the final output.