bytedance / fc-clip

[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Apache License 2.0
275 stars 27 forks source link

An error occured. #34

Open Yuhuoo opened 1 month ago

Yuhuoo commented 1 month ago

I have meet an error when using demo.py

(fcclip) ga@test-4U-GPU-Server:~/code/fc-clip$ python demo/demo.py --input 000741.jpg 000860.jpg --opts MODEL.WEIGHTS fcclip_cocopan.pth [07/15 15:09:43 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='configs/coco/panoptic-segmentation/fcclip/fcclip_convnext_large_eval_ade20k.yaml', input=['000741.jpg', '000860.jpg'], opts=['MODEL.WEIGHTS', 'fcclip_cocopan.pth'], output=None, video_input=None, webcam=False) [07/15 15:10:11 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from fcclip_cocopan.pth ... [07/15 15:10:11 fvcore.common.checkpoint]: [Checkpointer] Loading from fcclip_cocopan.pth ... WARNING [07/15 15:10:11 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: backbone.clip_model.ln_final.{bias, weight} backbone.clip_model.token_embedding.weight backbone.clip_model.transformer.resblocks.0.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.0.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.0.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.0.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.0.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.0.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.1.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.1.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.1.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.1.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.1.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.1.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.10.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.10.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.10.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.10.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.10.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.10.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.11.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.11.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.11.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.11.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.11.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.11.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.12.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.12.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.12.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.12.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.12.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.12.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.13.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.13.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.13.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.13.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.13.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.13.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.14.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.14.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.14.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.14.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.14.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.14.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.15.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.15.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.15.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.15.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.15.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.15.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.2.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.2.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.2.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.2.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.2.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.2.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.3.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.3.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.3.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.3.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.3.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.3.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.4.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.4.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.4.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.4.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.4.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.4.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.5.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.5.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.5.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.5.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.5.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.5.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.6.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.6.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.6.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.6.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.6.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.6.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.7.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.7.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.7.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.7.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.7.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.7.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.8.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.8.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.8.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.8.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.8.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.8.mlp.c_proj.{bias, weight} backbone.clip_model.transformer.resblocks.9.attn.out_proj.{bias, weight} backbone.clip_model.transformer.resblocks.9.attn.{in_proj_bias, in_proj_weight} backbone.clip_model.transformer.resblocks.9.ln_1.{bias, weight} backbone.clip_model.transformer.resblocks.9.ln_2.{bias, weight} backbone.clip_model.transformer.resblocks.9.mlp.c_fc.{bias, weight} backbone.clip_model.transformer.resblocks.9.mlp.c_proj.{bias, weight} backbone.clip_model.visual.head.mlp.fc1.{bias, weight} backbone.clip_model.visual.head.mlp.fc2.weight backbone.clip_model.visual.trunk.head.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.0.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.0.gamma backbone.clip_model.visual.trunk.stages.0.blocks.0.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.0.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.0.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.1.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.1.gamma backbone.clip_model.visual.trunk.stages.0.blocks.1.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.1.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.1.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.2.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.2.gamma backbone.clip_model.visual.trunk.stages.0.blocks.2.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.2.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.0.blocks.2.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.0.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.0.gamma backbone.clip_model.visual.trunk.stages.1.blocks.0.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.0.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.0.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.1.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.1.gamma backbone.clip_model.visual.trunk.stages.1.blocks.1.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.1.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.1.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.2.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.2.gamma backbone.clip_model.visual.trunk.stages.1.blocks.2.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.2.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.1.blocks.2.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.1.downsample.0.{bias, weight} backbone.clip_model.visual.trunk.stages.1.downsample.1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.0.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.0.gamma backbone.clip_model.visual.trunk.stages.2.blocks.0.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.0.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.0.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.1.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.1.gamma backbone.clip_model.visual.trunk.stages.2.blocks.1.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.1.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.1.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.10.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.10.gamma backbone.clip_model.visual.trunk.stages.2.blocks.10.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.10.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.10.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.11.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.11.gamma backbone.clip_model.visual.trunk.stages.2.blocks.11.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.11.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.11.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.12.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.12.gamma backbone.clip_model.visual.trunk.stages.2.blocks.12.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.12.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.12.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.13.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.13.gamma backbone.clip_model.visual.trunk.stages.2.blocks.13.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.13.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.13.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.14.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.14.gamma backbone.clip_model.visual.trunk.stages.2.blocks.14.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.14.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.14.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.15.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.15.gamma backbone.clip_model.visual.trunk.stages.2.blocks.15.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.15.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.15.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.16.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.16.gamma backbone.clip_model.visual.trunk.stages.2.blocks.16.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.16.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.16.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.17.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.17.gamma backbone.clip_model.visual.trunk.stages.2.blocks.17.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.17.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.17.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.18.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.18.gamma backbone.clip_model.visual.trunk.stages.2.blocks.18.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.18.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.18.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.19.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.19.gamma backbone.clip_model.visual.trunk.stages.2.blocks.19.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.19.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.19.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.2.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.2.gamma backbone.clip_model.visual.trunk.stages.2.blocks.2.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.2.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.2.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.20.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.20.gamma backbone.clip_model.visual.trunk.stages.2.blocks.20.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.20.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.20.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.21.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.21.gamma backbone.clip_model.visual.trunk.stages.2.blocks.21.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.21.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.21.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.22.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.22.gamma backbone.clip_model.visual.trunk.stages.2.blocks.22.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.22.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.22.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.23.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.23.gamma backbone.clip_model.visual.trunk.stages.2.blocks.23.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.23.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.23.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.24.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.24.gamma backbone.clip_model.visual.trunk.stages.2.blocks.24.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.24.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.24.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.25.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.25.gamma backbone.clip_model.visual.trunk.stages.2.blocks.25.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.25.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.25.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.26.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.26.gamma backbone.clip_model.visual.trunk.stages.2.blocks.26.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.26.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.26.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.3.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.3.gamma backbone.clip_model.visual.trunk.stages.2.blocks.3.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.3.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.3.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.4.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.4.gamma backbone.clip_model.visual.trunk.stages.2.blocks.4.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.4.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.4.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.5.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.5.gamma backbone.clip_model.visual.trunk.stages.2.blocks.5.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.5.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.5.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.6.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.6.gamma backbone.clip_model.visual.trunk.stages.2.blocks.6.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.6.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.6.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.7.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.7.gamma backbone.clip_model.visual.trunk.stages.2.blocks.7.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.7.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.7.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.8.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.8.gamma backbone.clip_model.visual.trunk.stages.2.blocks.8.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.8.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.8.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.9.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.9.gamma backbone.clip_model.visual.trunk.stages.2.blocks.9.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.9.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.2.blocks.9.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.2.downsample.0.{bias, weight} backbone.clip_model.visual.trunk.stages.2.downsample.1.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.0.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.0.gamma backbone.clip_model.visual.trunk.stages.3.blocks.0.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.0.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.0.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.1.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.1.gamma backbone.clip_model.visual.trunk.stages.3.blocks.1.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.1.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.1.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.2.conv_dw.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.2.gamma backbone.clip_model.visual.trunk.stages.3.blocks.2.mlp.fc1.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.2.mlp.fc2.{bias, weight} backbone.clip_model.visual.trunk.stages.3.blocks.2.norm.{bias, weight} backbone.clip_model.visual.trunk.stages.3.downsample.0.{bias, weight} backbone.clip_model.visual.trunk.stages.3.downsample.1.{bias, weight} backbone.clip_model.visual.trunk.stem.0.{bias, weight} backbone.clip_model.visual.trunk.stem.1.{bias, weight} backbone.clip_model.{logit_scale, positional_embedding, text_projection} Traceback (most recent call last): File "demo/demo.py", line 124, in <module> predictions, visualized_output = demo.run_on_image(img) File "/data/Gaoao/code/fc-clip/demo/predictor.py", line 163, in run_on_image predictions = self.predictor(image) File "/data/Gaoao/code/fc-clip/detectron2/detectron2/engine/defaults.py", line 319, in __call__ predictions = self.model([inputs])[0] File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/Gaoao/code/fc-clip/demo/../fcclip/fcclip.py", line 324, in forward text_classifier, num_templates = self.get_text_classifier() File "/data/Gaoao/code/fc-clip/demo/../fcclip/fcclip.py", line 208, in get_text_classifier text_classifier.append(self.backbone.get_text_classifier(self.test_class_names[idx:idx+bs], self.device).detach()) File "/data/Gaoao/code/fc-clip/demo/../fcclip/modeling/backbone/clip.py", line 211, in get_text_classifier text_features = self.encode_text(text_tokens, normalize=False) File "/data/Gaoao/code/fc-clip/demo/../fcclip/modeling/backbone/clip.py", line 95, in encode_text x = self.clip_model.transformer(x, attn_mask=self.clip_model.attn_mask) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/open_clip/transformer.py", line 363, in forward x = r(x, attn_mask=attn_mask) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/open_clip/transformer.py", line 263, in forward x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask)) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/open_clip/transformer.py", line 250, in attention return self.attn( File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1241, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/home/ga/software/anaconda3/envs/fcclip/lib/python3.8/site-packages/torch/nn/functional.py", line 5318, in multi_head_attention_forward raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.") RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (128, 128).

ruiming46zrm commented 1 month ago

i faced the same problem , anyone can answer ? how to solve this

abhishekaich27 commented 1 month ago

It's due to a specific version of open-clip-pytorch. This solution should solve your problem: https://github.com/TencentARC/MotionCtrl/issues/31#issuecomment-2229964459