czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.23k stars 137 forks source link

Model for real time Segmetation #34

Open LongPML opened 2 years ago

LongPML commented 2 years ago

Hello @czczup, @whai362, @duanduanduanyuchen,

I would like to use ViT-Adapter for real time driving scene sematic segmentation task. Is there any model that can be used for this task and still get high performance?

I have tried Mask R-CNN with ViT-Adapter-T backbone on DeiT-T pretrain model but I got this error.

Here is my note book Link.


2022-08-19 07:04:35,726 - mmdet - WARNING - unexpected key in source state_dict: cls_token, norm.weight, norm.bias, head.weight, head.bias

missing keys in source state_dict: blocks.7.gamma1, blocks.2.gamma2, blocks.7.gamma2, blocks.2.gamma1, blocks.6.gamma1, blocks.8.gamma1, blocks.9.gamma2, blocks.9.gamma1, blocks.3.gamma2, blocks.10.gamma2, blocks.4.gamma2, blocks.6.gamma2, blocks.1.gamma2, blocks.4.gamma1, blocks.3.gamma1, blocks.0.gamma1, blocks.11.gamma1, blocks.8.gamma2, blocks.0.gamma2, blocks.10.gamma1, blocks.5.gamma2, blocks.5.gamma1, blocks.11.gamma2, blocks.1.gamma1

load checkpoint from local path: checkpoint/mask_rcnn_deit_adapter_tiny_fpn_3x_coco.pth.tar
/usr/local/lib/python3.7/dist-packages/mmdet/apis/inference.py:50: UserWarning: Class names are not saved in the checkpoint's meta data, use COCO classes by default.
  warnings.warn('Class names are not saved in the checkpoint\'s '
/usr/local/lib/python3.7/dist-packages/mmdet/datasets/utils.py:70: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
  'data pipeline in your config file.', UserWarning)
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3658: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  "The default behavior for interpolate/upsample with float scale_factor changed "  ```
czczup commented 2 years ago

Hi, your notebook is very good. All the steps are correct. This output is just a warning, not an error. This warning is due to the use of LayerScale in our model, which was not included in the original DeiT pre-trained weights. The output of image_demo.py is saved as a file in demo/. You can check if this file is generated.

LongPML commented 2 years ago

Thanks for your response.

I currently want to use ViT-Adapter for real time driving scene sematic segmentation for about 10 classes. Would you recommend some suitable models for tuning?