farewellthree / STAN

Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
Apache License 2.0
90 stars 3 forks source link

About the weights of original CLIP layer #8

Open LLFabiann opened 1 year ago

LLFabiann commented 1 year ago

Are the weights of original CLIP layer always frozen during the whole training process?

farewellthree commented 1 year ago

No, the original CLIP layers are open. Ablation shows freezing CLIP leads to inferior results (but still owns obvious performance over baseline). To my best knowledge, there are no video-text retrieval models freezing CLIP, hence, we have to open CLIP for fair comparison.