Closed an99990 closed 2 years ago
Hi @an99990, with your last command, you visualize the results for the instance segmentation task. Therefore, the foreground objects are predicted. The demo script is based on the detectron2 library, which overlays the prediction over the original image so that the output image won't be precisely the same but with predictions overlayed on top of the input image. You can take a deeper look into this with this link: Visualizer Class in Detectron2.
Also, SeMask is designed mainly for semantic segmentation. So, I request you test only with semantic-segmentation config files for semask.
Also, I would like to point out that you are trying to run the demo script with an R101.pkl
model, which I suppose has the parameters only for the backbone, so the predictions will not be accurate enough.
So, please try testing a semantic-segmentation model changing the config path in the command given on this page with one corresponding to the semantic segmentation task.
thank you @praeclarumjj3 for your quick answer ! so here are the commands i did
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_large_patch4_window12_384_22k.pth swin_large_patch4_window12_384_22k.pkl
python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../swin_large_patch4_window12_384_22k.pkl
log : ` python demo.py --config-file ../configs/ade2 0k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../swin_large_pa tch4_window12_384_22k.pkl [02/09 13:02:26 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml', input=['../images/person_bike.jpg'], opts=['MODEL.WEIGHTS', '../swin_large_patch4_window12_384_22k.pkl'], output=None, video_input=None, webcam=False) WARNING [02/09 13:02:26 fvcore.common.config]: Loading config ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. WARNING [02/09 13:02:26 fvcore.common.config]: Loading config ../configs/ade20k/semantic-segmentation/semask_swin/../Base-ADE20K-SemanticSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. /opt/conda/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2156.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] [02/09 13:02:30 fvcore.common.checkpoint]: [Checkpointer] Loading from ../swin_large_patch4_window12_384_22k.pkl ... [02/09 13:02:30 fvcore.common.checkpoint]: Reading a file from 'third_party' WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.adapter_1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.adapter_1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.layer_1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.pixel_decoder.layer_1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.0.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.0.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.2.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.2.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.3.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.3.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.4.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.4.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.5.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.5.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.6.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.6.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.7.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.7.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.8.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_cross_attention_layers.8.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.0.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.0.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.2.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.2.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.3.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.3.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.4.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.4.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.5.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.5.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.6.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.6.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.7.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.7.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.8.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_ffn_layers.8.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.0.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.0.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.1.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.1.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.2.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.2.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.3.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.3.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.4.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.4.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.5.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.5.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.6.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.6.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.7.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.7.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.8.norm.bias in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired. WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1536]), while shape of sem_seg_head.predictor.transformer_self_attention_layers.8.norm.weight in model is torch.Size([256]). WARNING [02/09 13:02:30 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired. [02/09 13:02:30 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone: | Names in Model | Names in Checkpoint | Shapes |
---|---|---|---|
layers.0.blocks.0.attn.* | layers.0.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (192,) (192,192) (576,) (576,192) (529,6) (144,144) | |
layers.0.blocks.0.mlp.fc1.* | layers.0.blocks.0.mlp.fc1.{bias,weight} | (768,) (768,192) | |
layers.0.blocks.0.mlp.fc2.* | layers.0.blocks.0.mlp.fc2.{bias,weight} | (192,) (192,768) | |
layers.0.blocks.0.norm1.* | layers.0.blocks.0.norm1.{bias,weight} | (192,) (192,) | |
layers.0.blocks.0.norm2.* | layers.0.blocks.0.norm2.{bias,weight} | (192,) (192,) | |
layers.0.blocks.1.attn.* | layers.0.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (192,) (192,192) (576,) (576,192) (529,6) (144,144) | |
layers.0.blocks.1.mlp.fc1.* | layers.0.blocks.1.mlp.fc1.{bias,weight} | ||
(768,) (768,192) | |||
layers.0.blocks.1.mlp.fc2.* | layers.0.blocks.1.mlp.fc2.{bias,weight} | (192,) (192,768) | |
layers.0.blocks.1.norm1.* | layers.0.blocks.1.norm1.{bias,weight} | (192,) (192,) | |
layers.0.blocks.1.norm2.* | layers.0.blocks.1.norm2.{bias,weight} | (192,) (192,) | |
layers.0.downsample.norm.* | layers.0.downsample.norm.{bias,weight} | (768,) (768,) | |
layers.0.downsample.reduction.weight | layers.0.downsample.reduction.weight | (384, 768) | |
layers.1.blocks.0.attn.* | layers.1.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (384,) (384,384) (1152,) (1152,384) (529,12) (144,144) | |
layers.1.blocks.0.mlp.fc1.* | layers.1.blocks.0.mlp.fc1.{bias,weight} | (1536,) (1536,384) | |
layers.1.blocks.0.mlp.fc2.* | layers.1.blocks.0.mlp.fc2.{bias,weight} | (384,) (384,1536) | |
layers.1.blocks.0.norm1.* | layers.1.blocks.0.norm1.{bias,weight} | (384,) (384,) | |
layers.1.blocks.0.norm2.* | layers.1.blocks.0.norm2.{bias,weight} | (384,) (384,) | |
layers.1.blocks.1.attn.* | layers.1.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (384,) (384,384) (1152,) (1152,384) (529,12) (144,144) | |
layers.1.blocks.1.mlp.fc1.* | layers.1.blocks.1.mlp.fc1.{bias,weight} | (1536,) (1536,384) | |
layers.1.blocks.1.mlp.fc2.* | layers.1.blocks.1.mlp.fc2.{bias,weight} | (384,) (384,1536) | |
layers.1.blocks.1.norm1.* | layers.1.blocks.1.norm1.{bias,weight} | (384,) (384,) | |
layers.1.blocks.1.norm2.* | layers.1.blocks.1.norm2.{bias,weight} | (384,) (384,) | |
layers.1.downsample.norm.* | layers.1.downsample.norm.{bias,weight} | (1536,) (1536,) | |
layers.1.downsample.reduction.weight | layers.1.downsample.reduction.weight | (768, 1536) | |
layers.2.blocks.0.attn.* | layers.2.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.0.mlp.fc1.* | layers.2.blocks.0.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.0.mlp.fc2.* | layers.2.blocks.0.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.0.norm1.* | layers.2.blocks.0.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.0.norm2.* | layers.2.blocks.0.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.1.attn.* | layers.2.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.1.mlp.fc1.* | layers.2.blocks.1.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.1.mlp.fc2.* | layers.2.blocks.1.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.1.norm1.* | layers.2.blocks.1.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.1.norm2.* | layers.2.blocks.1.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.10.attn.* | layers.2.blocks.10.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.10.mlp.fc1.* | layers.2.blocks.10.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.10.mlp.fc2.* | layers.2.blocks.10.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.10.norm1.* | layers.2.blocks.10.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.10.norm2.* | layers.2.blocks.10.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.11.attn.* | layers.2.blocks.11.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.11.mlp.fc1.* | layers.2.blocks.11.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.11.mlp.fc2.* | layers.2.blocks.11.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.11.norm1.* | layers.2.blocks.11.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.11.norm2.* | layers.2.blocks.11.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.12.attn.* | layers.2.blocks.12.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.12.mlp.fc1.* | layers.2.blocks.12.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.12.mlp.fc2.* | layers.2.blocks.12.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.12.norm1.* | layers.2.blocks.12.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.12.norm2.* | layers.2.blocks.12.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.13.attn.* | layers.2.blocks.13.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.13.mlp.fc1.* | layers.2.blocks.13.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.13.mlp.fc2.* | layers.2.blocks.13.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.13.norm1.* | layers.2.blocks.13.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.13.norm2.* | layers.2.blocks.13.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.14.attn.* | layers.2.blocks.14.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.14.mlp.fc1.* | layers.2.blocks.14.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.14.mlp.fc2.* | layers.2.blocks.14.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.14.norm1.* | layers.2.blocks.14.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.14.norm2.* | layers.2.blocks.14.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.15.attn.* | layers.2.blocks.15.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.15.mlp.fc1.* | layers.2.blocks.15.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.15.mlp.fc2.* | layers.2.blocks.15.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.15.norm1.* | layers.2.blocks.15.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.15.norm2.* | layers.2.blocks.15.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.16.attn.* | layers.2.blocks.16.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.16.mlp.fc1.* | layers.2.blocks.16.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.16.mlp.fc2.* | layers.2.blocks.16.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.16.norm1.* | layers.2.blocks.16.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.16.norm2.* | layers.2.blocks.16.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.17.attn.* | layers.2.blocks.17.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.17.mlp.fc1.* | layers.2.blocks.17.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.17.mlp.fc2.* | layers.2.blocks.17.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.17.norm1.* | layers.2.blocks.17.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.17.norm2.* | layers.2.blocks.17.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.2.attn.* | layers.2.blocks.2.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.2.mlp.fc1.* | layers.2.blocks.2.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.2.mlp.fc2.* | layers.2.blocks.2.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.2.norm1.* | layers.2.blocks.2.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.2.norm2.* | layers.2.blocks.2.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.3.attn.* | layers.2.blocks.3.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.3.mlp.fc1.* | layers.2.blocks.3.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.3.mlp.fc2.* | layers.2.blocks.3.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.3.norm1.* | layers.2.blocks.3.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.3.norm2.* | layers.2.blocks.3.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.4.attn.* | layers.2.blocks.4.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.4.mlp.fc1.* | layers.2.blocks.4.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.4.mlp.fc2.* | layers.2.blocks.4.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.4.norm1.* | layers.2.blocks.4.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.4.norm2.* | layers.2.blocks.4.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.5.attn.* | layers.2.blocks.5.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.5.mlp.fc1.* | layers.2.blocks.5.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.5.mlp.fc2.* | layers.2.blocks.5.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.5.norm1.* | layers.2.blocks.5.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.5.norm2.* | layers.2.blocks.5.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.6.attn.* | layers.2.blocks.6.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.6.mlp.fc1.* | layers.2.blocks.6.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.6.mlp.fc2.* | layers.2.blocks.6.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.6.norm1.* | layers.2.blocks.6.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.6.norm2.* | layers.2.blocks.6.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.7.attn.* | layers.2.blocks.7.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.7.mlp.fc1.* | layers.2.blocks.7.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.7.mlp.fc2.* | layers.2.blocks.7.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.7.norm1.* | layers.2.blocks.7.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.7.norm2.* | layers.2.blocks.7.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.8.attn.* | layers.2.blocks.8.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.8.mlp.fc1.* | layers.2.blocks.8.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.8.mlp.fc2.* | layers.2.blocks.8.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.8.norm1.* | layers.2.blocks.8.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.8.norm2.* | layers.2.blocks.8.norm2.{bias,weight} | (768,) (768,) | |
layers.2.blocks.9.attn.* | layers.2.blocks.9.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (768,) (768,768) (2304,) (2304,768) (529,24) (144,144) | |
layers.2.blocks.9.mlp.fc1.* | layers.2.blocks.9.mlp.fc1.{bias,weight} | (3072,) (3072,768) | |
layers.2.blocks.9.mlp.fc2.* | layers.2.blocks.9.mlp.fc2.{bias,weight} | (768,) (768,3072) | |
layers.2.blocks.9.norm1.* | layers.2.blocks.9.norm1.{bias,weight} | (768,) (768,) | |
layers.2.blocks.9.norm2.* | layers.2.blocks.9.norm2.{bias,weight} | (768,) (768,) | |
layers.2.downsample.norm.* | layers.2.downsample.norm.{bias,weight} | (3072,) (3072,) | |
layers.2.downsample.reduction.weight | layers.2.downsample.reduction.weight | (1536, 3072) | |
layers.3.blocks.0.attn.* | layers.3.blocks.0.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (1536,) (1536,1536) (4608,) (4608,1536) (529,48) (144,144) | |
layers.3.blocks.0.mlp.fc1.* | layers.3.blocks.0.mlp.fc1.{bias,weight} | (6144,) (6144,1536) | |
layers.3.blocks.0.mlp.fc2.* | layers.3.blocks.0.mlp.fc2.{bias,weight} | (1536,) (1536,6144) | |
layers.3.blocks.0.norm1.* | layers.3.blocks.0.norm1.{bias,weight} | (1536,) (1536,) | |
layers.3.blocks.0.norm2.* | layers.3.blocks.0.norm2.{bias,weight} | (1536,) (1536,) | |
layers.3.blocks.1.attn.* | layers.3.blocks.1.attn.{proj.bias,proj.weight,qkv.bias,qkv.weight,relative_position_bias_table,relative_position_index} | (1536,) (1536,1536) (4608,) (4608,1536) (529,48) (144,144) | |
layers.3.blocks.1.mlp.fc1.* | layers.3.blocks.1.mlp.fc1.{bias,weight} | (6144,) (6144,1536) | |
layers.3.blocks.1.mlp.fc2.* | layers.3.blocks.1.mlp.fc2.{bias,weight} | (1536,) (1536,6144) | |
layers.3.blocks.1.norm1.* | layers.3.blocks.1.norm1.{bias,weight} | (1536,) (1536,) | |
layers.3.blocks.1.norm2.* | layers.3.blocks.1.norm2.{bias,weight} | (1536,) (1536,) | |
patch_embed.norm.* | patch_embed.norm.{bias,weight} | ||
(192,) (192,) | |||
patch_embed.proj.* | patch_embed.proj.{bias,weight} | ||
(192,) (192,3,4,4) |
WARNING [02/09 13:02:30 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: backbone.layers.0.semantic_layer.class_injection.0.gamma backbone.layers.0.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.0.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.0.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.0.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.0.semantic_layer.norm1.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.gamma backbone.layers.1.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.1.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.1.semantic_layer.norm1.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.gamma backbone.layers.2.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.2.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.2.semantic_layer.norm1.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.gamma backbone.layers.3.semantic_layer.class_injection.0.mlp_cls_k.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.mlp_cls_q.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.mlp_res.{bias, weight} backbone.layers.3.semantic_layer.class_injection.0.mlp_v.{bias, weight} backbone.layers.3.semantic_layer.norm1.{bias, weight} backbone.norm0.{bias, weight} backbone.norm1.{bias, weight} backbone.norm2.{bias, weight} backbone.norm3.{bias, weight} criterion.empty_weight sem_seg_head.pixel_decoder.adapter_1.norm.{bias, weight} sem_seg_head.pixel_decoder.adapter_1.weight sem_seg_head.pixel_decoder.input_proj.0.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.0.1.{bias, weight} sem_seg_head.pixel_decoder.input_proj.1.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.1.1.{bias, weight} sem_seg_head.pixel_decoder.input_proj.2.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.2.1.{bias, weight} sem_seg_head.pixel_decoder.layer_1.norm.{bias, weight} sem_seg_head.pixel_decoder.layer_1.weight sem_seg_head.pixel_decoder.mask_features.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.level_embed sem_seg_head.predictor.class_embed.{bias, weight} sem_seg_head.predictor.decoder_norm.{bias, weight} sem_seg_head.predictor.level_embed.weight sem_seg_head.predictor.mask_embed.layers.0.{bias, weight} sem_seg_head.predictor.mask_embed.layers.1.{bias, weight} sem_seg_head.predictor.mask_embed.layers.2.{bias, weight} sem_seg_head.predictor.query_embed.weight sem_seg_head.predictor.query_feat.weight sem_seg_head.predictor.transformer_cross_attention_layers.0.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.0.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.0.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.1.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.1.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.1.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.2.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.2.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.2.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.3.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.3.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.3.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.4.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.4.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.4.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.5.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.5.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.5.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.6.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.6.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.6.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.7.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.7.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.7.norm.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.8.multihead_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_cross_attention_layers.8.multihead_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_cross_attention_layers.8.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.0.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.0.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.0.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.1.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.1.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.1.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.2.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.2.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.2.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.3.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.3.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.3.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.4.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.4.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.4.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.5.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.5.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.5.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.6.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.6.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.6.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.7.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.7.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.7.norm.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.8.linear1.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.8.linear2.{bias, weight} sem_seg_head.predictor.transformer_ffn_layers.8.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.0.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.0.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.0.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.1.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.1.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.1.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.2.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.2.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.2.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.3.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.3.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.3.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.4.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.4.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.4.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.5.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.5.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.5.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.6.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.6.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.6.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.7.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.7.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.7.self_attn.{in_proj_bias, in_proj_weight} sem_seg_head.predictor.transformer_self_attention_layers.8.norm.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.8.self_attn.out_proj.{bias, weight} sem_seg_head.predictor.transformer_self_attention_layers.8.self_attn.{in_proj_bias, in_proj_weight} WARNING [02/09 13:02:30 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: head.{bias, weight} layers.0.blocks.1.attn_mask layers.1.blocks.1.attn_mask layers.2.blocks.1.attn_mask layers.2.blocks.11.attn_mask layers.2.blocks.13.attn_mask layers.2.blocks.15.attn_mask layers.2.blocks.17.attn_mask layers.2.blocks.3.attn_mask layers.2.blocks.5.attn_mask layers.2.blocks.7.attn_mask layers.2.blocks.9.attn_mask norm.{bias, weight} /opt/halodi/git/halodi-segmentation/halodi_segmentation/models/SeMask-Segmentation/SeMask-Mask2Former/demo/../mask2former/modeling/transformer_decoder/position_encoding.py:41: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). dim_t = self.temperature * (2 (dim_t // 2) / self.num_pos_feats) [02/09 13:02:34 d2.utils.memory]: Attempting to copy inputs of <function sem_seg_postprocess at 0x7f3b2c89d280> to CPU due to CUDA OOM [02/09 13:02:36 detectron2]: ../images/person_bike.jpg: finished in 5.86s`
Result
please let me know what if i have done something incorrect
When you run the following commands:
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_large_patch4_window12_384_22k.pth swin_large_patch4_window12_384_22k.pkl
python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../swin_large_patch4_window12_384_22k.pkl
You are testing a SeMask-L Mask2Former
model by loading only the Swin-L
backbone weights. Thus, you get warnings for missing keys in the logs.
Instead, you can try downloading the correct model from the table in the ReadMe. Also, remember that the models are trained on specific datasets.
thank you @praeclarumjj3 , I have loaded the correct model and it seems to work full script for others:
python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../semask_large_mask2former_ade20k\ \(1\).pth
My output image is always the same as the input image. Is it possible to have one working example with the demo.
I tried theses
python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../R-101.pkl
python demo.py --config-file ../configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../semask_large_mask2former_ade20k\ \(1\).pth
python demo.py --config-file ../configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml --input ../images/person_bike.jpg --opts MODEL.WEIGHTS ../R-101.pkl
the last command i got this
thank you