Syliz517 / CLIP-ReID

Official implementation for "CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels" (AAAI 2023)
MIT License
234 stars 39 forks source link

How to fix KeyError: 'cv_embed'? #43

Open tinybitdesu opened 1 month ago

tinybitdesu commented 1 month ago

When testing, I faced a problem that calls cv_embed is lost, but I printed the key and I think cv_embed is existing. the code and the result is listed, thanks for any help!

**def load_param(self, trained_path):
    param_dict = torch.load(trained_path)
    for i in param_dict:
        print("Keys in param_dict:", param_dict.keys())
        self.state_dict()[i.replace('module.', '')].copy_(param_dict[i])
    print('Loading pretrained model from {}'.format(trained_path))**

"Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. " => Market1501 loaded Dataset statistics:

subset | # ids | # images | # cameras

train | 751 | 12936 | 6 query | 750 | 3368 | 6 gallery | 751 | 15913 | 6

Resized position embedding: %s to %s torch.Size([197, 768]) torch.Size([129, 768]) Position embedding resize to height:16 width: 8 Keys in param_dict: odict_keys(['cv_embed', 'classifier.weight', 'classifier_proj.weight', 'bottleneck.weight', 'bottleneck.bias', 'bottleneck.running_mean', 'bottleneck.running_var', 'bottleneck.num_batches_tracked', 'bottleneck_proj.weight', 'bottleneck_proj.bias', 'bottleneck_proj.running_mean', 'bottleneck_proj.running_var', 'bottleneck_proj.num_batches_tracked', 'image_encoder.class_embedding', 'image_encoder.positional_embedding', 'image_encoder.proj', 'image_encoder.conv1.weight', 'image_encoder.ln_pre.weight', 'image_encoder.ln_pre.bias', 'image_encoder.transformer.resblocks.0.attn.in_proj_weight', 'image_encoder.transformer.resblocks.0.attn.in_proj_bias', 'image_encoder.transformer.resblocks.0.attn.out_proj.weight', 'image_encoder.transformer.resblocks.0.attn.out_proj.bias', 'image_encoder.transformer.resblocks.0.ln_1.weight', 'image_encoder.transformer.resblocks.0.ln_1.bias', 'image_encoder.transformer.resblocks.0.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.0.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.0.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.0.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.0.ln_2.weight', 'image_encoder.transformer.resblocks.0.ln_2.bias', 'image_encoder.transformer.resblocks.1.attn.in_proj_weight', 'image_encoder.transformer.resblocks.1.attn.in_proj_bias', 'image_encoder.transformer.resblocks.1.attn.out_proj.weight', 'image_encoder.transformer.resblocks.1.attn.out_proj.bias', 'image_encoder.transformer.resblocks.1.ln_1.weight', 'image_encoder.transformer.resblocks.1.ln_1.bias', 'image_encoder.transformer.resblocks.1.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.1.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.1.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.1.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.1.ln_2.weight', 'image_encoder.transformer.resblocks.1.ln_2.bias', 'image_encoder.transformer.resblocks.2.attn.in_proj_weight', 'image_encoder.transformer.resblocks.2.attn.in_proj_bias', 'image_encoder.transformer.resblocks.2.attn.out_proj.weight', 'image_encoder.transformer.resblocks.2.attn.out_proj.bias', 'image_encoder.transformer.resblocks.2.ln_1.weight', 'image_encoder.transformer.resblocks.2.ln_1.bias', 'image_encoder.transformer.resblocks.2.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.2.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.2.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.2.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.2.ln_2.weight', 'image_encoder.transformer.resblocks.2.ln_2.bias', 'image_encoder.transformer.resblocks.3.attn.in_proj_weight', 'image_encoder.transformer.resblocks.3.attn.in_proj_bias', 'image_encoder.transformer.resblocks.3.attn.out_proj.weight', 'image_encoder.transformer.resblocks.3.attn.out_proj.bias', 'image_encoder.transformer.resblocks.3.ln_1.weight', 'image_encoder.transformer.resblocks.3.ln_1.bias', 'image_encoder.transformer.resblocks.3.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.3.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.3.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.3.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.3.ln_2.weight', 'image_encoder.transformer.resblocks.3.ln_2.bias', 'image_encoder.transformer.resblocks.4.attn.in_proj_weight', 'image_encoder.transformer.resblocks.4.attn.in_proj_bias', 'image_encoder.transformer.resblocks.4.attn.out_proj.weight', 'image_encoder.transformer.resblocks.4.attn.out_proj.bias', 'image_encoder.transformer.resblocks.4.ln_1.weight', 'image_encoder.transformer.resblocks.4.ln_1.bias', 'image_encoder.transformer.resblocks.4.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.4.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.4.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.4.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.4.ln_2.weight', 'image_encoder.transformer.resblocks.4.ln_2.bias', 'image_encoder.transformer.resblocks.5.attn.in_proj_weight', 'image_encoder.transformer.resblocks.5.attn.in_proj_bias', 'image_encoder.transformer.resblocks.5.attn.out_proj.weight', 'image_encoder.transformer.resblocks.5.attn.out_proj.bias', 'image_encoder.transformer.resblocks.5.ln_1.weight', 'image_encoder.transformer.resblocks.5.ln_1.bias', 'image_encoder.transformer.resblocks.5.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.5.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.5.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.5.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.5.ln_2.weight', 'image_encoder.transformer.resblocks.5.ln_2.bias', 'image_encoder.transformer.resblocks.6.attn.in_proj_weight', 'image_encoder.transformer.resblocks.6.attn.in_proj_bias', 'image_encoder.transformer.resblocks.6.attn.out_proj.weight', 'image_encoder.transformer.resblocks.6.attn.out_proj.bias', 'image_encoder.transformer.resblocks.6.ln_1.weight', 'image_encoder.transformer.resblocks.6.ln_1.bias', 'image_encoder.transformer.resblocks.6.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.6.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.6.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.6.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.6.ln_2.weight', 'image_encoder.transformer.resblocks.6.ln_2.bias', 'image_encoder.transformer.resblocks.7.attn.in_proj_weight', 'image_encoder.transformer.resblocks.7.attn.in_proj_bias', 'image_encoder.transformer.resblocks.7.attn.out_proj.weight', 'image_encoder.transformer.resblocks.7.attn.out_proj.bias', 'image_encoder.transformer.resblocks.7.ln_1.weight', 'image_encoder.transformer.resblocks.7.ln_1.bias', 'image_encoder.transformer.resblocks.7.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.7.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.7.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.7.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.7.ln_2.weight', 'image_encoder.transformer.resblocks.7.ln_2.bias', 'image_encoder.transformer.resblocks.8.attn.in_proj_weight', 'image_encoder.transformer.resblocks.8.attn.in_proj_bias', 'image_encoder.transformer.resblocks.8.attn.out_proj.weight', 'image_encoder.transformer.resblocks.8.attn.out_proj.bias', 'image_encoder.transformer.resblocks.8.ln_1.weight', 'image_encoder.transformer.resblocks.8.ln_1.bias', 'image_encoder.transformer.resblocks.8.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.8.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.8.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.8.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.8.ln_2.weight', 'image_encoder.transformer.resblocks.8.ln_2.bias', 'image_encoder.transformer.resblocks.9.attn.in_proj_weight', 'image_encoder.transformer.resblocks.9.attn.in_proj_bias', 'image_encoder.transformer.resblocks.9.attn.out_proj.weight', 'image_encoder.transformer.resblocks.9.attn.out_proj.bias', 'image_encoder.transformer.resblocks.9.ln_1.weight', 'image_encoder.transformer.resblocks.9.ln_1.bias', 'image_encoder.transformer.resblocks.9.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.9.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.9.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.9.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.9.ln_2.weight', 'image_encoder.transformer.resblocks.9.ln_2.bias', 'image_encoder.transformer.resblocks.10.attn.in_proj_weight', 'image_encoder.transformer.resblocks.10.attn.in_proj_bias', 'image_encoder.transformer.resblocks.10.attn.out_proj.weight', 'image_encoder.transformer.resblocks.10.attn.out_proj.bias', 'image_encoder.transformer.resblocks.10.ln_1.weight', 'image_encoder.transformer.resblocks.10.ln_1.bias', 'image_encoder.transformer.resblocks.10.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.10.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.10.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.10.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.10.ln_2.weight', 'image_encoder.transformer.resblocks.10.ln_2.bias', 'image_encoder.transformer.resblocks.11.attn.in_proj_weight', 'image_encoder.transformer.resblocks.11.attn.in_proj_bias', 'image_encoder.transformer.resblocks.11.attn.out_proj.weight', 'image_encoder.transformer.resblocks.11.attn.out_proj.bias', 'image_encoder.transformer.resblocks.11.ln_1.weight', 'image_encoder.transformer.resblocks.11.ln_1.bias', 'image_encoder.transformer.resblocks.11.mlp.c_fc.weight', 'image_encoder.transformer.resblocks.11.mlp.c_fc.bias', 'image_encoder.transformer.resblocks.11.mlp.c_proj.weight', 'image_encoder.transformer.resblocks.11.mlp.c_proj.bias', 'image_encoder.transformer.resblocks.11.ln_2.weight', 'image_encoder.transformer.resblocks.11.ln_2.bias', 'image_encoder.ln_post.weight', 'image_encoder.ln_post.bias', 'prompt_learner.cls_ctx', 'prompt_learner.token_prefix', 'prompt_learner.token_suffix', 'text_encoder.positional_embedding', 'text_encoder.text_projection', 'text_encoder.transformer.resblocks.0.attn.in_proj_weight', 'text_encoder.transformer.resblocks.0.attn.in_proj_bias', 'text_encoder.transformer.resblocks.0.attn.out_proj.weight', 'text_encoder.transformer.resblocks.0.attn.out_proj.bias', 'text_encoder.transformer.resblocks.0.ln_1.weight', 'text_encoder.transformer.resblocks.0.ln_1.bias', 'text_encoder.transformer.resblocks.0.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.0.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.0.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.0.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.0.ln_2.weight', 'text_encoder.transformer.resblocks.0.ln_2.bias', 'text_encoder.transformer.resblocks.1.attn.in_proj_weight', 'text_encoder.transformer.resblocks.1.attn.in_proj_bias', 'text_encoder.transformer.resblocks.1.attn.out_proj.weight', 'text_encoder.transformer.resblocks.1.attn.out_proj.bias', 'text_encoder.transformer.resblocks.1.ln_1.weight', 'text_encoder.transformer.resblocks.1.ln_1.bias', 'text_encoder.transformer.resblocks.1.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.1.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.1.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.1.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.1.ln_2.weight', 'text_encoder.transformer.resblocks.1.ln_2.bias', 'text_encoder.transformer.resblocks.2.attn.in_proj_weight', 'text_encoder.transformer.resblocks.2.attn.in_proj_bias', 'text_encoder.transformer.resblocks.2.attn.out_proj.weight', 'text_encoder.transformer.resblocks.2.attn.out_proj.bias', 'text_encoder.transformer.resblocks.2.ln_1.weight', 'text_encoder.transformer.resblocks.2.ln_1.bias', 'text_encoder.transformer.resblocks.2.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.2.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.2.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.2.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.2.ln_2.weight', 'text_encoder.transformer.resblocks.2.ln_2.bias', 'text_encoder.transformer.resblocks.3.attn.in_proj_weight', 'text_encoder.transformer.resblocks.3.attn.in_proj_bias', 'text_encoder.transformer.resblocks.3.attn.out_proj.weight', 'text_encoder.transformer.resblocks.3.attn.out_proj.bias', 'text_encoder.transformer.resblocks.3.ln_1.weight', 'text_encoder.transformer.resblocks.3.ln_1.bias', 'text_encoder.transformer.resblocks.3.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.3.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.3.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.3.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.3.ln_2.weight', 'text_encoder.transformer.resblocks.3.ln_2.bias', 'text_encoder.transformer.resblocks.4.attn.in_proj_weight', 'text_encoder.transformer.resblocks.4.attn.in_proj_bias', 'text_encoder.transformer.resblocks.4.attn.out_proj.weight', 'text_encoder.transformer.resblocks.4.attn.out_proj.bias', 'text_encoder.transformer.resblocks.4.ln_1.weight', 'text_encoder.transformer.resblocks.4.ln_1.bias', 'text_encoder.transformer.resblocks.4.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.4.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.4.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.4.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.4.ln_2.weight', 'text_encoder.transformer.resblocks.4.ln_2.bias', 'text_encoder.transformer.resblocks.5.attn.in_proj_weight', 'text_encoder.transformer.resblocks.5.attn.in_proj_bias', 'text_encoder.transformer.resblocks.5.attn.out_proj.weight', 'text_encoder.transformer.resblocks.5.attn.out_proj.bias', 'text_encoder.transformer.resblocks.5.ln_1.weight', 'text_encoder.transformer.resblocks.5.ln_1.bias', 'text_encoder.transformer.resblocks.5.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.5.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.5.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.5.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.5.ln_2.weight', 'text_encoder.transformer.resblocks.5.ln_2.bias', 'text_encoder.transformer.resblocks.6.attn.in_proj_weight', 'text_encoder.transformer.resblocks.6.attn.in_proj_bias', 'text_encoder.transformer.resblocks.6.attn.out_proj.weight', 'text_encoder.transformer.resblocks.6.attn.out_proj.bias', 'text_encoder.transformer.resblocks.6.ln_1.weight', 'text_encoder.transformer.resblocks.6.ln_1.bias', 'text_encoder.transformer.resblocks.6.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.6.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.6.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.6.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.6.ln_2.weight', 'text_encoder.transformer.resblocks.6.ln_2.bias', 'text_encoder.transformer.resblocks.7.attn.in_proj_weight', 'text_encoder.transformer.resblocks.7.attn.in_proj_bias', 'text_encoder.transformer.resblocks.7.attn.out_proj.weight', 'text_encoder.transformer.resblocks.7.attn.out_proj.bias', 'text_encoder.transformer.resblocks.7.ln_1.weight', 'text_encoder.transformer.resblocks.7.ln_1.bias', 'text_encoder.transformer.resblocks.7.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.7.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.7.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.7.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.7.ln_2.weight', 'text_encoder.transformer.resblocks.7.ln_2.bias', 'text_encoder.transformer.resblocks.8.attn.in_proj_weight', 'text_encoder.transformer.resblocks.8.attn.in_proj_bias', 'text_encoder.transformer.resblocks.8.attn.out_proj.weight', 'text_encoder.transformer.resblocks.8.attn.out_proj.bias', 'text_encoder.transformer.resblocks.8.ln_1.weight', 'text_encoder.transformer.resblocks.8.ln_1.bias', 'text_encoder.transformer.resblocks.8.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.8.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.8.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.8.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.8.ln_2.weight', 'text_encoder.transformer.resblocks.8.ln_2.bias', 'text_encoder.transformer.resblocks.9.attn.in_proj_weight', 'text_encoder.transformer.resblocks.9.attn.in_proj_bias', 'text_encoder.transformer.resblocks.9.attn.out_proj.weight', 'text_encoder.transformer.resblocks.9.attn.out_proj.bias', 'text_encoder.transformer.resblocks.9.ln_1.weight', 'text_encoder.transformer.resblocks.9.ln_1.bias', 'text_encoder.transformer.resblocks.9.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.9.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.9.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.9.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.9.ln_2.weight', 'text_encoder.transformer.resblocks.9.ln_2.bias', 'text_encoder.transformer.resblocks.10.attn.in_proj_weight', 'text_encoder.transformer.resblocks.10.attn.in_proj_bias', 'text_encoder.transformer.resblocks.10.attn.out_proj.weight', 'text_encoder.transformer.resblocks.10.attn.out_proj.bias', 'text_encoder.transformer.resblocks.10.ln_1.weight', 'text_encoder.transformer.resblocks.10.ln_1.bias', 'text_encoder.transformer.resblocks.10.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.10.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.10.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.10.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.10.ln_2.weight', 'text_encoder.transformer.resblocks.10.ln_2.bias', 'text_encoder.transformer.resblocks.11.attn.in_proj_weight', 'text_encoder.transformer.resblocks.11.attn.in_proj_bias', 'text_encoder.transformer.resblocks.11.attn.out_proj.weight', 'text_encoder.transformer.resblocks.11.attn.out_proj.bias', 'text_encoder.transformer.resblocks.11.ln_1.weight', 'text_encoder.transformer.resblocks.11.ln_1.bias', 'text_encoder.transformer.resblocks.11.mlp.c_fc.weight', 'text_encoder.transformer.resblocks.11.mlp.c_fc.bias', 'text_encoder.transformer.resblocks.11.mlp.c_proj.weight', 'text_encoder.transformer.resblocks.11.mlp.c_proj.bias', 'text_encoder.transformer.resblocks.11.ln_2.weight', 'text_encoder.transformer.resblocks.11.ln_2.bias', 'text_encoder.ln_final.weight', 'text_encoder.ln_final.bias']) Traceback (most recent call last): File "test_clipreid.py", line 44, in model.load_param(cfg.TEST.WEIGHT) File "/home/disk/hgk/CLIP-ReID-master/model/make_model_clipreid.py", line 160, in load_param self.statedict()[i.replace('module.', '')].copy(param_dict[i]) KeyError: 'cv_embed'

zerobest commented 2 weeks ago

you can change weight ,please choose ViT-CLIP-ReID

tinybitdesu commented 2 weeks ago

you can change weight ,please choose ViT-CLIP-ReID

Thanks! I had already changed my model to ViT-CLIP-ReID following Readme, and I got the result. But I am facing new bugs, in model\clip\VisionTransformer function, if I want to change x shape, x = x + self.positional_embedding.to(x.dtype) will throw an exception that shapes not match. But when I change self.positional_embedding`s shape, it seems model will throw another exception that self.positional_embedding need to be(129, 768). Could you give some advice to debug these exceptions? thanks for any help! Best wishes!