Outdated .pth and .yaml files for Di

calebhemara commented 1 year ago

Thanks for your incredible work team. Getting this error on inference:

Weight format of OneFormerHead have changed! Please upgrade your models. Applying automatic conversion now ... WARNING [11/26 13:32:01 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:

Using this model for config & checkpoint: OneFormer | DiNAT-L† | 896×896

I am running inference on cpu with: cfg.MODEL.DEVICE = 'cpu'

Any idea where I'm going wrong would be greatly appreciated. Thanks !

praeclarumjj3 commented 1 year ago

Hi @calebhemara, thanks for your interest in our work.

I do not encounter the error on my end. Are you sure your detectron2 version matches the one we suggest in the installation instructions?

Also, it seems only a warning. Can you share the complete traceback log?

calebhemara commented 1 year ago

Strange... yes all versions match the documentation.

It is a warning, and the model still outputs a result, but the result is entirely inaccurate (pseudo-random), and I suspect it's to do with incorrect checkpoint key structuring from the .pth/.yaml files...

I am running on Macbook M1 Pro, hence running on cpu, perhaps this is an important factor

Log below, thanks again!

Weight format of OneFormerHead have changed! Please upgrade your models. Applying automatic conversion now ... WARNING [11/26 23:57:35 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: sem_seg_head.pixel_decoder.adapter_1.norm.{bias, weight} sem_seg_head.pixel_decoder.adapter_1.weight sem_seg_head.pixel_decoder.input_proj.0.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.0.1.{bias, weight} sem_seg_head.pixel_decoder.input_proj.1.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.1.1.{bias, weight} sem_seg_head.pixel_decoder.input_proj.2.0.{bias, weight} sem_seg_head.pixel_decoder.input_proj.2.1.{bias, weight} sem_seg_head.pixel_decoder.layer_1.norm.{bias, weight} sem_seg_head.pixel_decoder.layer_1.weight sem_seg_head.pixel_decoder.mask_features.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.0.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.1.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.2.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.3.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.4.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.linear1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.linear2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.norm1.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.norm2.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.transformer.encoder.layers.5.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.transformer.level_embed WARNING [11/26 23:57:35 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: prompt_ctx.weight text_encoder.ln_final.{bias, weight} text_encoder.positional_embedding text_encoder.token_embedding.weight text_encoder.transformer.resblocks.0.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.0.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_1.{bias, weight} text_encoder.transformer.resblocks.0.ln_2.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.1.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.1.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_1.{bias, weight} text_encoder.transformer.resblocks.1.ln_2.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.2.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.2.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_1.{bias, weight} text_encoder.transformer.resblocks.2.ln_2.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.3.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.3.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_1.{bias, weight} text_encoder.transformer.resblocks.3.ln_2.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.4.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.4.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_1.{bias, weight} text_encoder.transformer.resblocks.4.ln_2.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.5.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.5.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_1.{bias, weight} text_encoder.transformer.resblocks.5.ln_2.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_proj.{bias, weight} text_projector.layers.0.{bias, weight} text_projector.layers.1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.adapter_1.norm.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.adapter_1.weight sem_seg_head.pixel_decoder.pixel_decoder.input_proj.0.0.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.input_proj.0.1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.input_proj.1.0.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.input_proj.1.1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.input_proj.2.0.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.input_proj.2.1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.layer_1.norm.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.layer_1.weight sem_seg_head.pixel_decoder.pixel_decoder.mask_features.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.linear1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.linear2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.norm1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.norm2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.linear1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.linear2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.norm1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.norm2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.1.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.linear1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.linear2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.norm1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.norm2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.2.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.linear1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.linear2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.norm1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.norm2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.3.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.linear1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.linear2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.norm1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.norm2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.4.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.linear1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.linear2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.norm1.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.norm2.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.self_attn.attention_weights.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.self_attn.output_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.self_attn.sampling_offsets.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.5.self_attn.value_proj.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.transformer.level_embed

praeclarumjj3 commented 1 year ago

Hi @calebhemara, did you change the config file? From your shared log, it seems like you are using the wrong Pixel Decoder. Please confirm if the values in the config file are correct.

https://github.com/SHI-Labs/OneFormer/blob/33ebb56ed34f970a30ae103e786c0cb64c653d9a/configs/ade20k/oneformer_R50_bs16_160k.yaml#L13-L20

calebhemara commented 1 year ago

Thanks. No changes to config file/s. I've done a fresh clone (steps below), and still have the same error!

1) Cloned repo 2) Download .yaml and corresponding checkpoint 3) Move checkpoint to repo directory, move .yaml to ./configs/ade20k/dinat 4) Add line to demo.py at end of def setup_cfg(args): cfg.MODEL.DEVICE = 'cpu' 5) Ran script python demo.py --config-file ./configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_896x896.yaml \ --input ./IMG_8617_C.jpg \ --output ./IMG_8617_C_S.jpg \ --task 'semantic' \ --opts MODEL.IS_TRAIN False MODEL.IS_DEMO True MODEL.WEIGHTS ./896x896_250_16_dinat_l_oneformer_ade20k_160k.pth

Thanks again

praeclarumjj3 commented 1 year ago

Hi @calebhemara, I have two questions for you:

Why are you downloading the config file? It's already there with the repo clone.
Could you print the values of the cfg.MODEL.SEM_SEG_HEAD.PIXEL_DECODER_NAME and cfg.MODEL.ONE_FORMER.TRANSFORMER_IN_FEATURE and share those with me here? They should match the ones here.

calebhemara commented 1 year ago

Thanks @praeclarumjj3

I only downloaded the config file as a fail-safe to ensure I had the latest version in case the checkpoint .pth was saved from an updated config file.
print(cfg.MODEL.SEM_SEG_HEAD.PIXEL_DECODER_NAME, cfg.MODEL.ONE_FORMER.TRANSFORMER_IN_FEATURE) MSDeformAttnPixelDecoder multi_scale_pixel_decoder

I'll keep doing some homework and hopefully figure it out.

praeclarumjj3 commented 1 year ago

The configuration seems correct. Did you try a different config file?

calebhemara commented 1 year ago

Thanks @praeclarumjj3 , I've tried a Convnext, Swin and Dinat cfg and .pth. Perhaps it could have something to do with the lack of CUDA GPU on M1 Pro system. I've tried to find the checkpoint load route for setting to device to torch.device('cpu'), because my suspicion is that the .pth save is defaulting to loading on torch.device('cuda:0').

Still no luck 🤷‍♂️

praeclarumjj3 commented 1 year ago

I don't think it has anything related to the availability of the CUDA GPU. In our colab demo, we load the models on the CPU.

I think I know where the issue is. When you are loading the checkpoint, it's reading the keys for pixel_decoder as: sem_seg_head.pixel_decoder.pixel_decoder.transformer.encoder.layers.0.linear2.{bias, weight}.

Instead, it should be sem_seg_head.pixel_decoder.transformer.encoder.layers.0.linear2.{bias, weight}. There's an extra pixel_decoder. in the keys. Did you change the checkpoint after downloading? This is indeed strange.

pratos commented 1 year ago

I am running inference on GPU. The issue crops up coz of the different installation methods used.

Initially, I used torch 1.9.0@cu113 and detectron2 + natten compiled on cu113. It threw the same errors as mentioned. I used cog containers to build the environment and run inference. I moved to the installation mentioned in this colab repo for OneFormer. Was able to do inference just fine.

pratos commented 1 year ago

Using Swin backbone and aedk20k dataset config.

praeclarumjj3 commented 1 year ago

Thanks for your comment, @pratos. @calebhemara, were you able to solve this issue?

praeclarumjj3 commented 1 year ago

Closing this issue for now. Feel free to re-open if you face any more issues.

SHI-Labs / OneFormer

Outdated .pth and .yaml files for Di #10