Open qinghuannn opened 1 year ago
I printed the weights of the "mit" head in the released models (k400_B_32_16_rel.pth, k400_L_8_rel.pth, k400_B_16_16_rel.pth), and found that the released models trained on kinetics-400 do not match the code of ILA. Obviously, Class MultiframeIntegrationTransformer has a classify_head, but the "mit" head in released models does not have such a classify_head.
Then, I printed the weights of the "mit" head in the released model ( ssv2_B_16_rel.pth). I found that the released models trained on ssv2 have such a classify_head.
These results reveal all released models trained on kinetics-400 are incorrect. The authors should check all released models trained on kinetics-400. @daiqi1989 @Francis-Rings
>>> model = torch.load("./data/extractor/k400_B_32_16_rel.pth"))
>>> [x for x in model["model"] if x.startswith("mit")]
['mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias']
>>> model = torch.load("./data/extractor/k400_L_8_rel.pth")
>>> [x for x in model["model"] if x.startswith("mit")]
['mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias']
>>> model = torch.load("./data/extractor/ssv2_B_16_rel.pth")
>>> [x for x in model["model"] if x.startswith("mit")]
['mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias', 'mit.classify_head.0.weight', 'mit.classify_head.0.bias', 'mit.classify_head.2.weight', 'mit.classify_head.2.bias']
Following the description of our paper, we mainly implement cross-entropy loss when our model is trained on Something-Something v2, while we train our model on k400 utilizing contrastive learning loss with the help of prompt branch. Therefore, our model needs classify_head on Something-Something v2.
Could you provide a simple few line script that does something like the following:
model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)
I wish to get the video features for a batch of mp4 files with different lengths. @Francis-Rings
如果可以的话,能否加个微信orQQ?我想请教下如何使用ILA提取特征
1 The code is updated to fix the bug. The master branch is for K400, while the SSV2 branch is for SSV2. The checkpoints should be matched to the models now.
Thank you for your help!
Hi, thanks for your nice work! I need to use ILA to extract video features for downstream tasks, but I met some problems when I tried to load the released checkpoints. Same as #1. Please help me to solve this problem.