Closed Jack2Lu closed 1 year ago
Hi. As you mentioned, the features for our teacher model on MSRVTT are quite large, approximately 270 GB. It will be difficult for us to provide hosting to share all these features and model checkpoints. We will update the README if we decide to share all the features in the future. In the meantime, you can follow the steps to generate each feature. Fine-tuning the CLIP4Clip model is actually quick, and the feature of CLIP4Clip model on MSRVTT is only around 300 MB.
Hello, I am interested in this work and excited to see the perfect performance of this work. Though the code has many scripts to extract features the model needs. I'm worried that the features are so large that it may cost lots of time to extract them. Also training a clip4clip model is time-consuming. So could you please share the clip4clip pretrained model or extracted features? Thank you very much.