JuanS286 / gif_auto_audio_description

A project that aims to generate the audio description of GIF's for visually impaired people
2 stars 2 forks source link

M2 - Model Building 3 #11

Closed GanzB02 closed 1 month ago

JuanS286 commented 1 month ago

I generated the first version of the model. It uses BLIP to extract frames captions, those frames go to a T5 model to train it for concatenating them and generating one final caption. Then metrics are measured, including BLEU, ROUGE, METEOR and BERT_F1.

GanzB02 commented 1 month ago

Model Building was completed but training failed. Used 3D CNN resnet to extract features and get an idea of vectors.

Next Step - Include GRU in Pipeline and Normalise Vectors and Tensors to get better accuracy

Closing this as completed for the 3D CNN Model Building, Evaluation and Training on Transformers will be for next task