RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (128, 128).

vamsikrishna7909 commented 2 months ago

Hi bytedance,

I was trying to reproduce the evaluation result of Cityscapes in the paper (test only table 2). Screenshot 2024-08-10 at 8 19 57 PM I have done the necessary setup.

When I try to run > python train_net.py \ --config-file configs/coco/panoptic-segmentation/fcclip/fcclip_convnext_large_eval_cityscapes.yaml \ --eval-only MODEL.WEIGHTS FC-CLIP_ConvNeXt-Large/fcclip_cocopan.pth

below are the logs I got

[08/11 02:18:13 fcclip.data.datasets.register_cityscapes_panoptic]: 3 cities found in '/home/jovyan/Desktop/shared/fc_clip_vamsi/detectron2/datasets/cityscapes/leftImg8bit/val'. [08/11 02:18:13 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1024, 1024), max_size=2560, sample_style='choice')] [08/11 02:18:13 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [08/11 02:18:13 d2.data.common]: Serializing 500 elements to byte tensors and concatenating them all ... [08/11 02:18:13 d2.data.common]: Serialized dataset takes 0.81 MiB [08/11 02:18:13 d2.evaluation.evaluator]: Start inference on 500 batches [08/11 02:18:13 d2.evaluation.cityscapes_evaluation]: Writing cityscapes results to temporary directory /tmp/cityscapes_eval_fnp4wh2e ... [08/11 02:18:13 d2.evaluation.cityscapes_evaluation]: Writing cityscapes results to temporary directory /tmp/cityscapes_eval_e0osk9d3 ... Traceback (most recent call last): File "train_net.py", line 340, in launch( File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/engine/launch.py", line 84, in launch main_func(args) File "train_net.py", line 325, in main res = Trainer.test(cfg, model) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/engine/defaults.py", line 621, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/evaluator.py", line 165, in inference_on_dataset outputs = model(inputs) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/jovyan/shared/fc_clip_vamsi/fc-clip/fcclip/fcclip.py", line 324, in forward text_classifier, num_templates = self.get_text_classifier() File "/home/jovyan/shared/fc_clip_vamsi/fc-clip/fcclip/fcclip.py", line 208, in get_text_classifier text_classifier.append(self.backbone.get_text_classifier(self.test_class_names[idx:idx+bs], self.device).detach()) File "/home/jovyan/shared/fc_clip_vamsi/fc-clip/fcclip/modeling/backbone/clip.py", line 211, in get_text_classifier text_features = self.encode_text(text_tokens, normalize=False) File "/home/jovyan/shared/fc_clip_vamsi/fc-clip/fcclip/modeling/backbone/clip.py", line 95, in encode_text x = self.clip_model.transformer(x, attn_mask=self.clip_model.attn_mask) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/open_clip/transformer.py", line 363, in forward x = r(x, attn_mask=attn_mask) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/open_clip/transformer.py", line 263, in forward x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask)) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/open_clip/transformer.py", line 250, in attention return self.attn( File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1031, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/nn/functional.py", line 4992, in multi_head_attention_forward raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.") RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (128, 128).

Please let me know if you need more info

Thanks!

vamsikrishna7909 commented 2 months ago

pip install open-clip-torch==2.24.0 this didn't solve the issue. I got another after installing this.

[08/11 03:23:43 fcclip.data.datasets.register_cityscapes_panoptic]: 3 cities found in '/home/jovyan/Desktop/shared/fc_clip_vamsi/detectron2/datasets/cityscapes/leftImg8bit/val'. [08/11 03:23:43 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1024, 1024), max_size=2560, sample_style='choice')] [08/11 03:23:43 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [08/11 03:23:43 d2.data.common]: Serializing 500 elements to byte tensors and concatenating them all ... [08/11 03:23:43 d2.data.common]: Serialized dataset takes 0.81 MiB [08/11 03:23:43 d2.evaluation.evaluator]: Start inference on 500 batches [08/11 03:23:43 d2.evaluation.cityscapes_evaluation]: Writing cityscapes results to temporary directory /tmp/cityscapes_eval_f5ytfymv ... [08/11 03:23:43 d2.evaluation.cityscapes_evaluation]: Writing cityscapes results to temporary directory /tmp/cityscapes_eval_hman420u ... /opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /tmp/pip-req-build-1_ic8ial/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) Traceback (most recent call last): File "train_net.py", line 340, in launch( File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/engine/launch.py", line 84, in launch main_func(*args) File "train_net.py", line 325, in main res = Trainer.test(cfg, model) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/engine/defaults.py", line 621, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/evaluator.py", line 172, in inference_on_dataset evaluator.process(inputs, outputs) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/evaluator.py", line 88, in process evaluator.process(inputs, outputs) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/cityscapes_evaluation.py", line 75, in process class_id = name2label[classes].id KeyError: 'car,cars'

ruiming46zrm commented 2 months ago

pip install open-clip-torch==2.24.0

vamsikrishna7909 commented 2 months ago

hi @ruiming46zrm , thanks for the reply.

pip install open-clip-torch==2.24.0 didn't solve the issue. I got another after installing this.

[08/11 03:23:43 fcclip.data.datasets.register_cityscapes_panoptic]: 3 cities found in '/home/jovyan/Desktop/shared/fc_clip_vamsi/detectron2/datasets/cityscapes/leftImg8bit/val'. [08/11 03:23:43 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1024, 1024), max_size=2560, sample_style='choice')] [08/11 03:23:43 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [08/11 03:23:43 d2.data.common]: Serializing 500 elements to byte tensors and concatenating them all ... [08/11 03:23:43 d2.data.common]: Serialized dataset takes 0.81 MiB [08/11 03:23:43 d2.evaluation.evaluator]: Start inference on 500 batches [08/11 03:23:43 d2.evaluation.cityscapes_evaluation]: Writing cityscapes results to temporary directory /tmp/cityscapes_eval_f5ytfymv ... [08/11 03:23:43 d2.evaluation.cityscapes_evaluation]: Writing cityscapes results to temporary directory /tmp/cityscapes_eval_hman420u ... /opt/conda/envs/fcclip/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /tmp/pip-req-build-1_ic8ial/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) Traceback (most recent call last): File "train_net.py", line 340, in launch( File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/engine/launch.py", line 84, in launch main_func(*args) File "train_net.py", line 325, in main res = Trainer.test(cfg, model) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/engine/defaults.py", line 621, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/evaluator.py", line 172, in inference_on_dataset evaluator.process(inputs, outputs) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/evaluator.py", line 88, in process evaluator.process(inputs, outputs) File "/home/jovyan/shared/fc_clip_vamsi/detectron2/detectron2/evaluation/cityscapes_evaluation.py", line 75, in process class_id = name2label[classes].id KeyError: 'car,cars'

ruiming46zrm commented 2 months ago

it's another issue, you can debugg and see your key names. and i think the attn_mask issue is solved

vamsikrishna7909 commented 2 months ago

Thanks! Those are default keys. and to bypass the issue I just assigned 0 to class_id for now.

bytedance / fc-clip

RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (128, 128). #36