czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.26k stars 139 forks source link

分割预测时,输入图像的宽 大于2048时,ms_deform_attn AssertionEror #44

Closed zhangsiqiGit closed 1 year ago

zhangsiqiGit commented 2 years ago

您好,十分感谢算法和预训练模型的开源,效果非常棒!

但在单图cityscapes格式数据预测时,发现当图像的宽 > 2048 时,运行报错,(input_spatial_shapes[:, 0] * input_spatial_shapes[:, 1]).sum() != Len_in。请问是什么原因呢?如何解决呢?

以下报错信息

CUDA_VISIBLE_DEVICES=0 python3 image_demo.py configs/cityscapes/mask2former_beit_adapter_large_896_80k_cityscapes_ss.py released/mask2former_beit_adapter_large_896_80k_mapillary.pth.tar data/6.jpg

/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/losses/cross_entropy_loss.py:231: UserWarning: Default avg_non_ignore is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set avg_non_ignore=True. 'Default avg_non_ignore is False, if you would like to ' load checkpoint from local path: released/mask2former_beit_adapter_large_896_80k_mapillary.pth.tar The model and loaded state dict do not match exactly

missing keys in source state_dict: backbone.blocks.0.attn.relative_position_index, backbone.blocks.1.attn.relative_position_index, backbone.blocks.2.attn.relative_position_index, backbone.blocks.3.attn.relative_position_index, backbone.blocks.4.attn.relative_position_index, backbone.blocks.5.attn.relative_position_index, backbone.blocks.6.attn.relative_position_index, backbone.blocks.7.attn.relative_position_index, backbone.blocks.8.attn.relative_position_index, backbone.blocks.9.attn.relative_position_index, backbone.blocks.10.attn.relative_position_index, backbone.blocks.11.attn.relative_position_index, backbone.blocks.12.attn.relative_position_index, backbone.blocks.13.attn.relative_position_index, backbone.blocks.14.attn.relative_position_index, backbone.blocks.15.attn.relative_position_index, backbone.blocks.16.attn.relative_position_index, backbone.blocks.17.attn.relative_position_index, backbone.blocks.18.attn.relative_position_index, backbone.blocks.19.attn.relative_position_index, backbone.blocks.20.attn.relative_position_index, backbone.blocks.21.attn.relative_position_index, backbone.blocks.22.attn.relative_position_index, backbone.blocks.23.attn.relative_position_index

test_cfg mode: slide /opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) input_spatial_shapes sum: tensor(11452, device='cuda:0') Len_in: 11648 Traceback (most recent call last): File "image_demo.py", line 59, in main() File "image_demo.py", line 45, in main result = inference_segmentor(model, args.img) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/mmseg/apis/inference.py", line 98, in inference_segmentor result = model(return_loss=False, rescale=True, data) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(args, kwargs) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/mmseg/models/segmentors/base.py", line 110, in forward return self.forward_test(img, img_metas, kwargs) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/mmseg/models/segmentors/base.py", line 92, in forward_test return self.simple_test(imgs[0], img_metas[0], kwargs) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/segmentors/encoder_decoder_mask2former.py", line 258, in simple_test seg_logit = self.inference(img, img_meta, rescale) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/segmentors/encoder_decoder_mask2former.py", line 241, in inference seg_logit = self.slide_inference(img, img_meta, rescale) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/segmentors/encoder_decoder_mask2former.py", line 180, in slide_inference crop_seg_logit = self.encode_decode(crop_img, img_meta) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/segmentors/encoder_decoder_mask2former.py", line 73, in encode_decode x = self.extract_feat(img) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/segmentors/encoder_decoder_mask2former.py", line 65, in extract_feat x = self.backbone(img) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/backbones/beit_adapter.py", line 116, in forward deform_inputs1, deform_inputs2, H, W) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/backbones/adapter_modules.py", line 219, in forward level_start_index=deform_inputs1[2]) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/backbones/adapter_modules.py", line 150, in forward query = _inner_forward(query, feat) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/mmseg_custom/models/backbones/adapter_modules.py", line 144, in _inner_forward level_start_index, None) File "/opt/tiger/user_envs/vit-adapter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/opt/tiger/algo/ViT-Adapter-main/segmentation/ops/modules/ms_deform_attn.py", line 103, in forward input_spatial_shapes[:, 1]).sum() == Len_in AssertionError

czczup commented 2 years ago

你好,感谢你的关注。

https://github.com/czczup/ViT-Adapter/blob/main/segmentation/configs/_base_/datasets/cityscapes_896x896.py

把这个文件的第21行的2048增大,就不会报这个错了

img_scale=(9999, 1024),
zhangsiqiGit commented 2 years ago

你好,感谢你的关注。

https://github.com/czczup/ViT-Adapter/blob/main/segmentation/configs/_base_/datasets/cityscapes_896x896.py

把这个文件的第21行的2048增大,就不会报这个错了

img_scale=(9999, 1024),

well done,已解决~ 另想咨询一下,这个数值调大之后对准确率会不会有一定的影响?

czczup commented 2 years ago

这个img_scale=(2048, 1024)的意思是,图像会被缩放短边等于1024,但长边不能超过2048。如果长边超过2048了,会被缩放到2048,这样短边就不到1024了。我们这个模型要保证图像的短边为1048,所以改成img_scale=(9999, 1024),对精度应该没什么影响。