Picsart-AI-Research / SeMask-Segmentation

[NIVT Workshop @ ICCV 2023] SeMask: Semantically Masked Transformers for Semantic Segmentation
https://arxiv.org/abs/2112.12782
Other
251 stars 36 forks source link

one working example #9

Closed an99990 closed 2 years ago

an99990 commented 2 years ago

My output image is always the same as the input image. Is it possible to have one working example with the demo.

I tried theses

python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../R-101.pkl

python demo.py --config-file ../configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../semask_large_mask2former_ade20k\ \(1\).pth

python demo.py --config-file ../configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../R-101.pkl

the last command i got this original image

image

thank you

praeclarumjj3 commented 2 years ago

Hi @an99990, with your last command, you visualize the results for the instance segmentation task. Therefore, the foreground objects are predicted. The demo script is based on the detectron2 library, which overlays the prediction over the original image so that the output image won't be precisely the same but with predictions overlayed on top of the input image. You can take a deeper look into this with this link: Visualizer Class in Detectron2.

Also, SeMask is designed mainly for semantic segmentation. So, I request you test only with semantic-segmentation config files for semask.

Also, I would like to point out that you are trying to run the demo script with an R101.pkl model, which I suppose has the parameters only for the backbone, so the predictions will not be accurate enough.

So, please try testing a semantic-segmentation model changing the config path in the command given on this page with one corresponding to the semantic segmentation task.

an99990 commented 2 years ago

thank you @praeclarumjj3 for your quick answer ! so here are the commands i did

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth

python tools/convert-pretrained-swin-model-to-d2.py swin_large_patch4_window12_384_22k.pth swin_large_patch4_window12_384_22k.pkl

python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../swin_large_patch4_window12_384_22k.pkl

log : ` python demo.py --config-file ../configs/ade2 0k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../swin_large_pa tch4_window12_384_22k.pkl [02/09 13:02:26 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml', input=['../images/person_bike.jpg'], opts=['MODEL.WEIGHTS', '../swin_large_patch4_window12_384_22k.pkl'], output=None, video_input=None, webcam=False) WARNING [02/09 13:02:26 fvcore.common.config]: Loading config ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. WARNING [02/09 13:02:26 fvcore.common.config]: Loading config ../configs/ade20k/semantic-segmentation/semask_swin/../Base-ADE20K-SemanticSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. /opt/conda/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2156.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] [02/09 13:02:30 fvcore.common.checkpoint]: [Checkpointer] Loading from ../swin_large_patch4_window12_384_22k.pkl ... [02/09 13:02:30 fvcore.common.checkpoint]: Reading a file from 'third_party' WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.adapter_1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.adapter_1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.layer_1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.layer_1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.0.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.0.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.2.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.2.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.3.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.3.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.4.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.4.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.5.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.5.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.6.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.6.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.7.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.7.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.8.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.8.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.0.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.0.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.2.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.2.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.3.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.3.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.4.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.4.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.5.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.5.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.6.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.6.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.7.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.7.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.8.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.8.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.0.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.0.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.2.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.2.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.3.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.3.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.4.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.4.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.5.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.5.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.6.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.6.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.7.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.7.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.8.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.8.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone: Names in Model Names in Checkpoint Shapes
layers.0.blocks.0.attn.* layers.0.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (192,) (192,192) (576,) (576,192) (529,6) (144,144)
layers.0.blocks.0.mlp.fc1.* layers.0.blocks.0.mlp.fc1.{bias,weight} (768,) (768,192)
layers.0.blocks.0.mlp.fc2.* layers.0.blocks.0.mlp.fc2.{bias,weight} (192,) (192,768)
layers.0.blocks.0.norm1.* layers.0.blocks.0.norm1.{bias,weight} (192,) (192,)
layers.0.blocks.0.norm2.* layers.0.blocks.0.norm2.{bias,weight} (192,) (192,)
layers.0.blocks.1.attn.* layers.0.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (192,) (192,192) (576,) (576,192) (529,6) (144,144)
layers.0.blocks.1.mlp.fc1.* layers.0.blocks.1.mlp.fc1.{bias,weight}
(768,) (768,192)
layers.0.blocks.1.mlp.fc2.* layers.0.blocks.1.mlp.fc2.{bias,weight} (192,) (192,768)
layers.0.blocks.1.norm1.* layers.0.blocks.1.norm1.{bias,weight} (192,) (192,)
layers.0.blocks.1.norm2.* layers.0.blocks.1.norm2.{bias,weight} (192,) (192,)
layers.0.downsample.norm.* layers.0.downsample.norm.{bias,weight} (768,) (768,)
layers.0.downsample.reduction.weight layers.0.downsample.reduction.weight (384, 768)
layers.1.blocks.0.attn.* layers.1.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (384,) (384,384) (1152,) (1152,384) (529,12) (144,144)
layers.1.blocks.0.mlp.fc1.* layers.1.blocks.0.mlp.fc1.{bias,weight} (1536,) (1536,384)
layers.1.blocks.0.mlp.fc2.* layers.1.blocks.0.mlp.fc2.{bias,weight} (384,) (384,1536)
layers.1.blocks.0.norm1.* layers.1.blocks.0.norm1.{bias,weight} (384,) (384,)
layers.1.blocks.0.norm2.* layers.1.blocks.0.norm2.{bias,weight} (384,) (384,)
layers.1.blocks.1.attn.* layers.1.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (384,) (384,384) (1152,) (1152,384) (529,12) (144,144)
layers.1.blocks.1.mlp.fc1.* layers.1.blocks.1.mlp.fc1.{bias,weight} (1536,) (1536,384)
layers.1.blocks.1.mlp.fc2.* layers.1.blocks.1.mlp.fc2.{bias,weight} (384,) (384,1536)
layers.1.blocks.1.norm1.* layers.1.blocks.1.norm1.{bias,weight} (384,) (384,)
layers.1.blocks.1.norm2.* layers.1.blocks.1.norm2.{bias,weight} (384,) (384,)
layers.1.downsample.norm.* layers.1.downsample.norm.{bias,weight} (1536,) (1536,)
layers.1.downsample.reduction.weight layers.1.downsample.reduction.weight (768, 1536)
layers.2.blocks.0.attn.* layers.2.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.0.mlp.fc1.* layers.2.blocks.0.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.0.mlp.fc2.* layers.2.blocks.0.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.0.norm1.* layers.2.blocks.0.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.0.norm2.* layers.2.blocks.0.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.1.attn.* layers.2.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.1.mlp.fc1.* layers.2.blocks.1.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.1.mlp.fc2.* layers.2.blocks.1.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.1.norm1.* layers.2.blocks.1.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.1.norm2.* layers.2.blocks.1.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.10.attn.* layers.2.blocks.10.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.10.mlp.fc1.* layers.2.blocks.10.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.10.mlp.fc2.* layers.2.blocks.10.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.10.norm1.* layers.2.blocks.10.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.10.norm2.* layers.2.blocks.10.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.11.attn.* layers.2.blocks.11.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.11.mlp.fc1.* layers.2.blocks.11.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.11.mlp.fc2.* layers.2.blocks.11.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.11.norm1.* layers.2.blocks.11.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.11.norm2.* layers.2.blocks.11.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.12.attn.* layers.2.blocks.12.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.12.mlp.fc1.* layers.2.blocks.12.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.12.mlp.fc2.* layers.2.blocks.12.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.12.norm1.* layers.2.blocks.12.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.12.norm2.* layers.2.blocks.12.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.13.attn.* layers.2.blocks.13.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.13.mlp.fc1.* layers.2.blocks.13.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.13.mlp.fc2.* layers.2.blocks.13.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.13.norm1.* layers.2.blocks.13.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.13.norm2.* layers.2.blocks.13.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.14.attn.* layers.2.blocks.14.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.14.mlp.fc1.* layers.2.blocks.14.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.14.mlp.fc2.* layers.2.blocks.14.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.14.norm1.* layers.2.blocks.14.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.14.norm2.* layers.2.blocks.14.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.15.attn.* layers.2.blocks.15.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.15.mlp.fc1.* layers.2.blocks.15.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.15.mlp.fc2.* layers.2.blocks.15.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.15.norm1.* layers.2.blocks.15.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.15.norm2.* layers.2.blocks.15.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.16.attn.* layers.2.blocks.16.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.16.mlp.fc1.* layers.2.blocks.16.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.16.mlp.fc2.* layers.2.blocks.16.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.16.norm1.* layers.2.blocks.16.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.16.norm2.* layers.2.blocks.16.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.17.attn.* layers.2.blocks.17.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.17.mlp.fc1.* layers.2.blocks.17.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.17.mlp.fc2.* layers.2.blocks.17.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.17.norm1.* layers.2.blocks.17.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.17.norm2.* layers.2.blocks.17.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.2.attn.* layers.2.blocks.2.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.2.mlp.fc1.* layers.2.blocks.2.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.2.mlp.fc2.* layers.2.blocks.2.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.2.norm1.* layers.2.blocks.2.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.2.norm2.* layers.2.blocks.2.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.3.attn.* layers.2.blocks.3.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.3.mlp.fc1.* layers.2.blocks.3.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.3.mlp.fc2.* layers.2.blocks.3.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.3.norm1.* layers.2.blocks.3.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.3.norm2.* layers.2.blocks.3.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.4.attn.* layers.2.blocks.4.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.4.mlp.fc1.* layers.2.blocks.4.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.4.mlp.fc2.* layers.2.blocks.4.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.4.norm1.* layers.2.blocks.4.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.4.norm2.* layers.2.blocks.4.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.5.attn.* layers.2.blocks.5.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.5.mlp.fc1.* layers.2.blocks.5.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.5.mlp.fc2.* layers.2.blocks.5.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.5.norm1.* layers.2.blocks.5.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.5.norm2.* layers.2.blocks.5.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.6.attn.* layers.2.blocks.6.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.6.mlp.fc1.* layers.2.blocks.6.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.6.mlp.fc2.* layers.2.blocks.6.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.6.norm1.* layers.2.blocks.6.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.6.norm2.* layers.2.blocks.6.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.7.attn.* layers.2.blocks.7.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.7.mlp.fc1.* layers.2.blocks.7.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.7.mlp.fc2.* layers.2.blocks.7.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.7.norm1.* layers.2.blocks.7.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.7.norm2.* layers.2.blocks.7.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.8.attn.* layers.2.blocks.8.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.8.mlp.fc1.* layers.2.blocks.8.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.8.mlp.fc2.* layers.2.blocks.8.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.8.norm1.* layers.2.blocks.8.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.8.norm2.* layers.2.blocks.8.norm2.{bias,weight} (768,) (768,)
layers.2.blocks.9.attn.* layers.2.blocks.9.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (768,) (768,768) (2304,) (2304,768) (529,24) (144,144)
layers.2.blocks.9.mlp.fc1.* layers.2.blocks.9.mlp.fc1.{bias,weight} (3072,) (3072,768)
layers.2.blocks.9.mlp.fc2.* layers.2.blocks.9.mlp.fc2.{bias,weight} (768,) (768,3072)
layers.2.blocks.9.norm1.* layers.2.blocks.9.norm1.{bias,weight} (768,) (768,)
layers.2.blocks.9.norm2.* layers.2.blocks.9.norm2.{bias,weight} (768,) (768,)
layers.2.downsample.norm.* layers.2.downsample.norm.{bias,weight} (3072,) (3072,)
layers.2.downsample.reduction.weight layers.2.downsample.reduction.weight (1536, 3072)
layers.3.blocks.0.attn.* layers.3.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (1536,) (1536,1536) (4608,) (4608,1536) (529,48) (144,144)
layers.3.blocks.0.mlp.fc1.* layers.3.blocks.0.mlp.fc1.{bias,weight} (6144,) (6144,1536)
layers.3.blocks.0.mlp.fc2.* layers.3.blocks.0.mlp.fc2.{bias,weight} (1536,) (1536,6144)
layers.3.blocks.0.norm1.* layers.3.blocks.0.norm1.{bias,weight} (1536,) (1536,)
layers.3.blocks.0.norm2.* layers.3.blocks.0.norm2.{bias,weight} (1536,) (1536,)
layers.3.blocks.1.attn.* layers.3.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} (1536,) (1536,1536) (4608,) (4608,1536) (529,48) (144,144)
layers.3.blocks.1.mlp.fc1.* layers.3.blocks.1.mlp.fc1.{bias,weight} (6144,) (6144,1536)
layers.3.blocks.1.mlp.fc2.* layers.3.blocks.1.mlp.fc2.{bias,weight} (1536,) (1536,6144)
layers.3.blocks.1.norm1.* layers.3.blocks.1.norm1.{bias,weight} (1536,) (1536,)
layers.3.blocks.1.norm2.* layers.3.blocks.1.norm2.{bias,weight} (1536,) (1536,)
patch_embed.norm.* patch_embed.norm.{bias,weight}
(192,) (192,)
patch_embed.proj.* patch_embed.proj.{bias,weight}
(192,) (192,3,4,4)

WARNING [02/09 13:02:30 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: backbone.layers.0.semantic_layer.class_injection.0.gamma backbone.layers.0.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.0.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.0.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.0.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.0.semantic_layer.norm1.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.gamma backbone.layers.1.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.1.semantic_layer.norm1.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.gamma backbone.layers.2.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.2.semantic_layer.norm1.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.gamma backbone.layers.3.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.3.semantic_layer.norm1.{bias, weight} backbone.norm0.{bias, weight} backbone.norm1.{bias, weight} backbone.norm2.{bias, weight} backbone.norm3.{bias, weight} criterion.empty_weight sem_seg_head.pixel_decoder.adapter_1.norm.{bias, weight} sem_seg_head.pixel_decoder.adapter_1.weight sem_seg_head.pixel_decoder.input_proj.0.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.0.1.{bias, weight} sem_seg_head.pixel_decoder.input_proj.1.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.1.1.{bias, weight} sem_seg_head.pixel_decoder.input_proj.2.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.2.1.{bias, weight} sem_seg_head.pixel_decoder.layer_1.norm.{bias, weight} sem_seg_head.pixel_decoder.layer_1.weight sem_seg_head.pixel_decoder.mask_features.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.level_embed sem_seg_head.predictor.class_embed.{bias, weight} sem_seg_head.predictor.decoder_norm.{bias, weight} sem_seg_head.predictor.level_embed.weight sem_seg_head.predictor.mask_embed.layers.0.{bias, weight} sem_seg_head.predictor.mask_embed.layers.1.{bias, weight} sem_seg_head.predictor.mask_embed.layers.2.{bias, weight} sem_seg_head.predictor.query_embed.weight sem_seg_head.predictor.query_feat.weight sem_seg_head.predictor.transformer_cross_attention_layers.0.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.0.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.0.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.1.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.1.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.1.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.2.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.2.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.2.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.3.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.3.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.3.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.4.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.4.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.4.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.5.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.5.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.5.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.6.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.6.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.6.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.7.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.7.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.7.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.8.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.8.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.8.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.0.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.0.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.0.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.1.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.1.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.1.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.2.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.2.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.2.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.3.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.3.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.3.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.4.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.4.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.4.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.5.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.5.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.5.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.6.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.6.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.6.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.7.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.7.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.7.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.8.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.8.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.8.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.0.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.0.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.0.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.1.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.1.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.1.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.2.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.2.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.2.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.3.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.3.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.3.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.4.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.4.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.4.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.5.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.5.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.5.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.6.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.6.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.6.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.7.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.7.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.7.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.8.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.8.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.8.self_attn.{in_proj_bias, in_proj_weight} WARNING [02/09 13:02:30 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: head.{bias, weight} layers.0.blocks.1.attn_mask layers.1.blocks.1.attn_mask layers.2.blocks.1.attn_mask layers.2.blocks.11.attn_mask layers.2.blocks.13.attn_mask layers.2.blocks.15.attn_mask layers.2.blocks.17.attn_mask layers.2.blocks.3.attn_mask layers.2.blocks.5.attn_mask layers.2.blocks.7.attn_mask layers.2.blocks.9.attn_mask norm.{bias, weight} /opt/halodi/git/halodi-segmentation/halodi_segmentation/models/SeMask-Segmentation/SeMask-Mask2Former/demo/../mask2former/modeling/transformer_decoder/position_encoding.py:41: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). dim_t = self.temperature * (2 (dim_t // 2) / self.num_pos_feats) [02/09 13:02:34 d2.utils.memory]: Attempting to copy inputs of <function sem_seg_postprocess at 0x7f3b2c89d280> to CPU due to CUDA OOM [02/09 13:02:36 detectron2]: ../images/person_bike.jpg: finished in 5.86s`

Result image

please let me know what if i have done something incorrect

praeclarumjj3 commented 2 years ago

When you run the following commands:

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth

python tools/convert-pretrained-swin-model-to-d2.py swin_large_patch4_window12_384_22k.pth swin_large_patch4_window12_384_22k.pkl

python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../swin_large_patch4_window12_384_22k.pkl

You are testing a SeMask-L Mask2Former model by loading only the Swin-L backbone weights. Thus, you get warnings for missing keys in the logs.

Instead, you can try downloading the correct model from the table in the ReadMe. Also, remember that the models are trained on specific datasets.

an99990 commented 2 years ago

thank you @praeclarumjj3 , I have loaded the correct model and it seems to work full script for others:

python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../semask_large_mask2former_ade20k\ \(1\).pth

image