M2 - Model Building 3 - Githubissues

JuanS286 / gif_auto_audio_description

A project that aims to generate the audio description of GIF's for visually impaired people

2 stars 2 forks source link

M2 - Model Building 3 #11

Closed GanzB02 closed 1 month ago

JuanS286 commented 1 month ago

I generated the first version of the model. It uses BLIP to extract frames captions, those frames go to a T5 model to train it for concatenating them and generating one final caption. Then metrics are measured, including BLEU, ROUGE, METEOR and BERT_F1.

GanzB02 commented 1 month ago

Model Building was completed but training failed. Used 3D CNN resnet to extract features and get an idea of vectors.

Custom Layer was added with the pretrained weights to extract features to feed in GRU/ LSTM
Visuals on how model detects motion

Next Step - Include GRU in Pipeline and Normalise Vectors and Tensors to get better accuracy

Closing this as completed for the 3D CNN Model Building, Evaluation and Training on Transformers will be for next task