Sejong-VLI / V2T-Action-Graph-JKSUCIS-2023

The implementation of a paper entitled "Action Knowledge for Video Captioning with Graph Neural Networks" (JKSUCIS 2023).
MIT License
14 stars 3 forks source link

Extracted features #2

Closed Jack2Lu closed 1 year ago

Jack2Lu commented 1 year ago

Hello, I am interested in this work and excited to see the perfect performance of this work. Though the code has many scripts to extract features the model needs. I'm worried that the features are so large that it may cost lots of time to extract them. Also training a clip4clip model is time-consuming. So could you please share the clip4clip pretrained model or extracted features? Thank you very much.

fadzaka12 commented 1 year ago

Hi. As you mentioned, the features for our teacher model on MSRVTT are quite large, approximately 270 GB. It will be difficult for us to provide hosting to share all these features and model checkpoints. We will update the README if we decide to share all the features in the future. In the meantime, you can follow the steps to generate each feature. Fine-tuning the CLIP4Clip model is actually quick, and the feature of CLIP4Clip model on MSRVTT is only around 300 MB.