Closed GanzB02 closed 1 month ago
Model Building was completed but training failed. Used 3D CNN resnet to extract features and get an idea of vectors.
Next Step - Include GRU in Pipeline and Normalise Vectors and Tensors to get better accuracy
Closing this as completed for the 3D CNN Model Building, Evaluation and Training on Transformers will be for next task
I generated the first version of the model. It uses BLIP to extract frames captions, those frames go to a T5 model to train it for concatenating them and generating one final caption. Then metrics are measured, including BLEU, ROUGE, METEOR and BERT_F1.