fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
67 stars 8 forks source link

[Bug] Cannot load the parameters of the advanced model (SwinL-based) on HICO-DET #41

Closed QihanZhao closed 9 months ago

QihanZhao commented 11 months ago

run main.py with args as

            "args": [
                "--backbone", "swin_large",
                "--drop-path-rate", "0.5", 
                "--num-queries-one2one", "900",
                "--num-queries-one2many", "1500",
                "--world-size", "1",
                "--batch-size", "1",
                "--eval", 
                "--resume", "./checkpoints/h-defm-detr-swinL-dp0-mqs-lft-iter-2stg-hicodet.pth"
            ],

when it goes to

model.load_state_dict(checkpoint['model_state_dict'])

error appears as follows [ in two parts: Missing keys and Unexpected keys]

It seems that the model has been updated since the parameters were released. Could you please provide the latest version that aligns with the current code? Thank you very much!

RuntimeError('Error(s) in loading state_dict for PViC:

Missing key(s) in state_dict: "detector.transformer.level_embed", "detector.transformer.encoder.layers.0.self_attn.sampling_offsets.weight", "detector.transformer.encoder.layers.0.self_attn.sampling_offsets.bias", "detector.transformer.encoder.layers.0.self_attn.attention_weights.weight", "detector.transformer.encoder.layers.0.self_attn.attention_weights.bias", "detector.transformer.encoder.layers.0.self_attn.value_proj.weight", "detector.transformer.encoder.layers.0.self_attn.value_proj.bias", "detector.transformer.encoder.layers.0.self_attn.output_proj.weight", "detector.transformer.encoder.layers.0.self_attn.output_proj.bias", "detector.transformer.encoder.layers.0.norm1.weight", "detector.transformer.encoder.layers.0.norm1.bias", "detector.transformer.encoder.layers.0.linear1.weight", "detector.transformer.encoder.layers.0.linear1.bias", "detector.transformer.encoder.layers.0.linear2.weight", "detector.transformer.encoder.layers.0.linear2.bias", "detector.transformer.encoder.layers.0.norm2.weight", "detector.transformer.encoder.layers.0.norm2.bias", "detector.transformer.encoder.layers.1.self_attn.sampling_offsets.weight", "detector.transformer.encoder.layers.1.self_attn.sampling_offsets.bias", "detector.transformer.encoder.layers.1.self_attn.attention_weights.weight", "detector.transformer.encoder.layers.1.self_attn.attention_weights.bias", "detector.transformer.encoder.layers.1.self_attn.value_proj.weight", "detector.transformer.encoder.layers.1.self_attn.value_proj.bias", "detector.transformer.encoder.layers.1.self_attn.output_proj.weight", "detector.transformer.encoder.layers.1.self_attn.output_proj.bias", "detector.transformer.encoder.layers.1.norm1.weight", "detector.transformer.encoder.layers.1.norm1.bias", "detector.transformer.encoder.layers.1.linear1.weight", "detector.transformer.encoder.layers.1.linear1.bias", "detector.transformer.encoder.layers.1.linear2.weight", "detector.transformer.encoder.layers.1.linear2.bias", "detector.transformer.encoder.layers.1.norm2.weight", "detector.transformer.encoder.layers.1.norm2.bias", "detector.transformer.encoder.layers.2.self_attn.sampling_offsets.weight", "detector.transformer.encoder.layers.2.self_attn.sampling_offsets.bias", "detector.transformer.encoder.layers.2.self_attn.attention_weights.weight", "detector.transformer.encoder.layers.2.self_attn.attention_weights.bias", "detector.transformer.encoder.layers.2.self_attn.value_proj.weight", "detector.transformer.encoder.layers.2.self_attn.value_proj.bias", "detector.transformer.encoder.layers.2.self_attn.output_proj.weight", "detector.transformer.encoder.layers.2.self_attn.output_proj.bias", "detector.transformer.encoder.layers.2.norm1.weight", "detector.transformer.encoder.layers.2.norm1.bias", "detector.transformer.encoder.layers.2.linear1.weight", "detector.transformer.encoder.layers.2.linear1.bias", "detector.transformer.encoder.layers.2.linear2.weight", "detector.transformer.encoder.layers.2.linear2.bias", "detector.transformer.encoder.layers.2.norm2.weight", "detector.transformer.encoder.layers.2.norm2.bias", "detector.transformer.encoder.layers.3.self_attn.sampling_offsets.weight", "detector.transformer.encoder.layers.3.self_attn.sampling_offsets.bias", "detector.transformer.encoder.layers.3.self_attn.attention_weights.weight", "detector.transformer.encoder.layers.3.self_attn.attention_weights.bias", "detector.transformer.encoder.layers.3.self_attn.value_proj.weight", "detector.transformer.encoder.layers.3.self_attn.value_proj.bias", "detector.transformer.encoder.layers.3.self_attn.output_proj.weight", "detector.transformer.encoder.layers.3.self_attn.output_proj.bias", "detector.transformer.encoder.layers.3.norm1.weight", "detector.transformer.encoder.layers.3.norm1.bias", "detector.transformer.encoder.layers.3.linear1.weight", "detector.transformer.encoder.layers.3.linear1.bias", "detector.transformer.encoder.layers.3.linear2.weight", "detector.transformer.encoder.layers.3.linear2.bias", "detector.transformer.encoder.layers.3.norm2.weight", "detector.transformer.encoder.layers.3.norm2.bias", "detector.transformer.encoder.layers.4.self_attn.sampling_offsets.weight", "detector.transformer.encoder.layers.4.self_attn.sampling_offsets.bias", "detector.transformer.encoder.layers.4.self_attn.attention_weights.weight", "detector.transformer.encoder.layers.4.self_attn.attention_weights.bias", "detector.transformer.encoder.layers.4.self_attn.value_proj.weight", "detector.transformer.encoder.layers.4.self_attn.value_proj.bias", "detector.transformer.encoder.layers.4.self_attn.output_proj.weight", "detector.transformer.encoder.layers.4.self_attn.output_proj.bias", "detector.transformer.encoder.layers.4.norm1.weight", "detector.transformer.encoder.layers.4.norm1.bias", "detector.transformer.encoder.layers.4.linear1.weight", "detector.transformer.encoder.layers.4.linear1.bias", "detector.transformer.encoder.layers.4.linear2.weight", "detector.transformer.encoder.layers.4.linear2.bias", "detector.transformer.encoder.layers.4.norm2.weight", "detector.transformer.encoder.layers.4.norm2.bias", "detector.transformer.encoder.layers.5.self_attn.sampling_offsets.weight", "detector.transformer.encoder.layers.5.self_attn.sampling_offsets.bias", "detector.transformer.encoder.layers.5.self_attn.attention_weights.weight", "detector.transformer.encoder.layers.5.self_attn.attention_weights.bias", "detector.transformer.encoder.layers.5.self_attn.value_proj.weight", "detector.transformer.encoder.layers.5.self_attn.value_proj.bias", "detector.transformer.encoder.layers.5.self_attn.output_proj.weight", "detector.transformer.encoder.layers.5.self_attn.output_proj.bias", "detector.transformer.encoder.layers.5.norm1.weight", "detector.transformer.encoder.layers.5.norm1.bias", "detector.transformer.encoder.layers.5.linear1.weight", "detector.transformer.encoder.layers.5.linear1.bias", "detector.transformer.encoder.layers.5.linear2.weight", "detector.transformer.encoder.layers.5.linear2.bias", "detector.transformer.encoder.layers.5.norm2.weight", "detector.transformer.encoder.layers.5.norm2.bias", "detector.transformer.decoder.layers.0.cross_attn.sampling_offsets.weight", "detector.transformer.decoder.layers.0.cross_attn.sampling_offsets.bias", "detector.transformer.decoder.layers.0.cross_attn.attention_weights.weight", "detector.transformer.decoder.layers.0.cross_attn.attention_weights.bias", "detector.transformer.decoder.layers.0.cross_attn.value_proj.weight", "detector.transformer.decoder.layers.0.cross_attn.value_proj.bias", "detector.transformer.decoder.layers.0.cross_attn.output_proj.weight", "detector.transformer.decoder.layers.0.cross_attn.output_proj.bias", "detector.transformer.decoder.layers.0.norm1.weight", "detector.transformer.decoder.layers.0.norm1.bias", "detector.transformer.decoder.layers.0.self_attn.in_proj_weight", "detector.transformer.decoder.layers.0.self_attn.in_proj_bias", "detector.transformer.decoder.layers.0.self_attn.out_proj.weight", "detector.transformer.decoder.layers.0.self_attn.out_proj.bias", "detector.transformer.decoder.layers.0.norm2.weight", "detector.transformer.decoder.layers.0.norm2.bias", "detector.transformer.decoder.layers.0.linear1.weight", "detector.transformer.decoder.layers.0.linear1.bias", "detector.transformer.decoder.layers.0.linear2.weight", "detector.transformer.decoder.layers.0.linear2.bias", "detector.transformer.decoder.layers.0.norm3.weight", "detector.transformer.decoder.layers.0.norm3.bias", "detector.transformer.decoder.layers.1.cross_attn.sampling_offsets.weight", "detector.transformer.decoder.layers.1.cross_attn.sampling_offsets.bias", "detector.transformer.decoder.layers.1.cross_attn.attention_weights.weight", "detector.transformer.decoder.layers.1.cross_attn.attention_weights.bias", "detector.transformer.decoder.layers.1.cross_attn.value_proj.weight", "detector.transformer.decoder.layers.1.cross_attn.value_proj.bias", "detector.transformer.decoder.layers.1.cross_attn.output_proj.weight", "detector.transformer.decoder.layers.1.cross_attn.output_proj.bias", "detector.transformer.decoder.layers.1.norm1.weight", "detector.transformer.decoder.layers.1.norm1.bias", "detector.transformer.decoder.layers.1.self_attn.in_proj_weight", "detector.transformer.decoder.layers.1.self_attn.in_proj_bias", "detector.transformer.decoder.layers.1.self_attn.out_proj.weight", "detector.transformer.decoder.layers.1.self_attn.out_proj.bias", "detector.transformer.decoder.layers.1.norm2.weight", "detector.transformer.decoder.layers.1.norm2.bias", "detector.transformer.decoder.layers.1.linear1.weight", "detector.transformer.decoder.layers.1.linear1.bias", "detector.transformer.decoder.layers.1.linear2.weight", "detector.transformer.decoder.layers.1.linear2.bias", "detector.transformer.decoder.layers.1.norm3.weight", "detector.transformer.decoder.layers.1.norm3.bias", "detector.transformer.decoder.layers.2.cross_attn.sampling_offsets.weight", "detector.transformer.decoder.layers.2.cross_attn.sampling_offsets.bias", "detector.transformer.decoder.layers.2.cross_attn.attention_weights.weight", "detector.transformer.decoder.layers.2.cross_attn.attention_weights.bias", "detector.transformer.decoder.layers.2.cross_attn.value_proj.weight", "detector.transformer.decoder.layers.2.cross_attn.value_proj.bias", "detector.transformer.decoder.layers.2.cross_attn.output_proj.weight", "detector.transformer.decoder.layers.2.cross_attn.output_proj.bias", "detector.transformer.decoder.layers.2.norm1.weight", "detector.transformer.decoder.layers.2.norm1.bias", "detector.transformer.decoder.layers.2.self_attn.in_proj_weight", "detector.transformer.decoder.layers.2.self_attn.in_proj_bias", "detector.transformer.decoder.layers.2.self_attn.out_proj.weight", "detector.transformer.decoder.layers.2.self_attn.out_proj.bias", "detector.transformer.decoder.layers.2.norm2.weight", "detector.transformer.decoder.layers.2.norm2.bias", "detector.transformer.decoder.layers.2.linear1.weight", "detector.transformer.decoder.layers.2.linear1.bias", "detector.transformer.decoder.layers.2.linear2.weight", "detector.transformer.decoder.layers.2.linear2.bias", "detector.transformer.decoder.layers.2.norm3.weight", "detector.transformer.decoder.layers.2.norm3.bias", "detector.transformer.decoder.layers.3.cross_attn.sampling_offsets.weight", "detector.transformer.decoder.layers.3.cross_attn.sampling_offsets.bias", "detector.transformer.decoder.layers.3.cross_attn.attention_weights.weight", "detector.transformer.decoder.layers.3.cross_attn.attention_weights.bias", "detector.transformer.decoder.layers.3.cross_attn.value_proj.weight", "detector.transformer.decoder.layers.3.cross_attn.value_proj.bias", "detector.transformer.decoder.layers.3.cross_attn.output_proj.weight", "detector.transformer.decoder.layers.3.cross_attn.output_proj.bias", "detector.transformer.decoder.layers.3.norm1.weight", "detector.transformer.decoder.layers.3.norm1.bias", "detector.transformer.decoder.layers.3.self_attn.in_proj_weight", "detector.transformer.decoder.layers.3.self_attn.in_proj_bias", "detector.transformer.decoder.layers.3.self_attn.out_proj.weight", "detector.transformer.decoder.layers.3.self_attn.out_proj.bias", "detector.transformer.decoder.layers.3.norm2.weight", "detector.transformer.decoder.layers.3.norm2.bias", "detector.transformer.decoder.layers.3.linear1.weight", "detector.transformer.decoder.layers.3.linear1.bias", "detector.transformer.decoder.layers.3.linear2.weight", "detector.transformer.decoder.layers.3.linear2.bias", "detector.transformer.decoder.layers.3.norm3.weight", "detector.transformer.decoder.layers.3.norm3.bias", "detector.transformer.decoder.layers.4.cross_attn.sampling_offsets.weight", "detector.transformer.decoder.layers.4.cross_attn.sampling_offsets.bias", "detector.transformer.decoder.layers.4.cross_attn.attention_weights.weight", "detector.transformer.decoder.layers.4.cross_attn.attention_weights.bias", "detector.transformer.decoder.layers.4.cross_attn.value_proj.weight", "detector.transformer.decoder.layers.4.cross_attn.value_proj.bias", "detector.transformer.decoder.layers.4.cross_attn.output_proj.weight", "detector.transformer.decoder.layers.4.cross_attn.output_proj.bias", "detector.transformer.decoder.layers.4.norm1.weight", "detector.transformer.decoder.layers.4.norm1.bias", "detector.transformer.decoder.layers.4.self_attn.in_proj_weight", "detector.transformer.decoder.layers.4.self_attn.in_proj_bias", "detector.transformer.decoder.layers.4.self_attn.out_proj.weight", "detector.transformer.decoder.layers.4.self_attn.out_proj.bias", "detector.transformer.decoder.layers.4.norm2.weight", "detector.transformer.decoder.layers.4.norm2.bias", "detector.transformer.decoder.layers.4.linear1.weight", "detector.transformer.decoder.layers.4.linear1.bias", "detector.transformer.decoder.layers.4.linear2.weight", "detector.transformer.decoder.layers.4.linear2.bias", "detector.transformer.decoder.layers.4.norm3.weight", "detector.transformer.decoder.layers.4.norm3.bias", "detector.transformer.decoder.layers.5.cross_attn.sampling_offsets.weight", "detector.transformer.decoder.layers.5.cross_attn.sampling_offsets.bias", "detector.transformer.decoder.layers.5.cross_attn.attention_weights.weight", "detector.transformer.decoder.layers.5.cross_attn.attention_weights.bias", "detector.transformer.decoder.layers.5.cross_attn.value_proj.weight", "detector.transformer.decoder.layers.5.cross_attn.value_proj.bias", "detector.transformer.decoder.layers.5.cross_attn.output_proj.weight", "detector.transformer.decoder.layers.5.cross_attn.output_proj.bias", "detector.transformer.decoder.layers.5.norm1.weight", "detector.transformer.decoder.layers.5.norm1.bias", "detector.transformer.decoder.layers.5.self_attn.in_proj_weight", "detector.transformer.decoder.layers.5.self_attn.in_proj_bias", "detector.transformer.decoder.layers.5.self_attn.out_proj.weight", "detector.transformer.decoder.layers.5.self_attn.out_proj.bias", "detector.transformer.decoder.layers.5.norm2.weight", "detector.transformer.decoder.layers.5.norm2.bias", "detector.transformer.decoder.layers.5.linear1.weight", "detector.transformer.decoder.layers.5.linear1.bias", "detector.transformer.decoder.layers.5.linear2.weight", "detector.transformer.decoder.layers.5.linear2.bias", "detector.transformer.decoder.layers.5.norm3.weight", "detector.transformer.decoder.layers.5.norm3.bias", "detector.transformer.decoder.bbox_embed.0.layers.0.weight", "detector.transformer.decoder.bbox_embed.0.layers.0.bias", "detector.transformer.decoder.bbox_embed.0.layers.1.weight", "detector.transformer.decoder.bbox_embed.0.layers.1.bias", "detector.transformer.decoder.bbox_embed.0.layers.2.weight", "detector.transformer.decoder.bbox_embed.0.layers.2.bias", "detector.transformer.decoder.bbox_embed.1.layers.0.weight", "detector.transformer.decoder.bbox_embed.1.layers.0.bias", "detector.transformer.decoder.bbox_embed.1.layers.1.weight", "detector.transformer.decoder.bbox_embed.1.layers.1.bias", "detector.transformer.decoder.bbox_embed.1.layers.2.weight", "detector.transformer.decoder.bbox_embed.1.layers.2.bias", "detector.transformer.decoder.bbox_embed.2.layers.0.weight", "detector.transformer.decoder.bbox_embed.2.layers.0.bias", "detector.transformer.decoder.bbox_embed.2.layers.1.weight", "detector.transformer.decoder.bbox_embed.2.layers.1.bias", "detector.transformer.decoder.bbox_embed.2.layers.2.weight", "detector.transformer.decoder.bbox_embed.2.layers.2.bias", "detector.transformer.decoder.bbox_embed.3.layers.0.weight", "detector.transformer.decoder.bbox_embed.3.layers.0.bias", "detector.transformer.decoder.bbox_embed.3.layers.1.weight", "detector.transformer.decoder.bbox_embed.3.layers.1.bias", "detector.transformer.decoder.bbox_embed.3.layers.2.weight", "detector.transformer.decoder.bbox_embed.3.layers.2.bias", "detector.transformer.decoder.bbox_embed.4.layers.0.weight", "detector.transformer.decoder.bbox_embed.4.layers.0.bias", "detector.transformer.decoder.bbox_embed.4.layers.1.weight", "detector.transformer.decoder.bbox_embed.4.layers.1.bias", "detector.transformer.decoder.bbox_embed.4.layers.2.weight", "detector.transformer.decoder.bbox_embed.4.layers.2.bias", "detector.transformer.decoder.bbox_embed.5.layers.0.weight", "detector.transformer.decoder.bbox_embed.5.layers.0.bias", "detector.transformer.decoder.bbox_embed.5.layers.1.weight", "detector.transformer.decoder.bbox_embed.5.layers.1.bias", "detector.transformer.decoder.bbox_embed.5.layers.2.weight", "detector.transformer.decoder.bbox_embed.5.layers.2.bias", "detector.transformer.decoder.bbox_embed.6.layers.0.weight", "detector.transformer.decoder.bbox_embed.6.layers.0.bias", "detector.transformer.decoder.bbox_embed.6.layers.1.weight", "detector.transformer.decoder.bbox_embed.6.layers.1.bias", "detector.transformer.decoder.bbox_embed.6.layers.2.weight", "detector.transformer.decoder.bbox_embed.6.layers.2.bias", "detector.transformer.decoder.class_embed.0.weight", "detector.transformer.decoder.class_embed.0.bias", "detector.transformer.decoder.class_embed.1.weight", "detector.transformer.decoder.class_embed.1.bias", "detector.transformer.decoder.class_embed.2.weight", "detector.transformer.decoder.class_embed.2.bias", "detector.transformer.decoder.class_embed.3.weight", "detector.transformer.decoder.class_embed.3.bias", "detector.transformer.decoder.class_embed.4.weight", "detector.transformer.decoder.class_embed.4.bias", "detector.transformer.decoder.class_embed.5.weight", "detector.transformer.decoder.class_embed.5.bias", "detector.transformer.decoder.class_embed.6.weight", "detector.transformer.decoder.class_embed.6.bias", "detector.transformer.enc_output.weight", "detector.transformer.enc_output.bias", "detector.transformer.enc_output_norm.weight", "detector.transformer.enc_output_norm.bias", "detector.transformer.pos_trans.weight", "detector.transformer.pos_trans.bias", "detector.transformer.pos_trans_norm.weight", "detector.transformer.pos_trans_norm.bias", "detector.class_embed.0.weight", "detector.class_embed.0.bias", "detector.class_embed.1.weight", "detector.class_embed.1.bias", "detector.class_embed.2.weight", "detector.class_embed.2.bias", "detector.class_embed.3.weight", "detector.class_embed.3.bias", "detector.class_embed.4.weight", "detector.class_embed.4.bias", "detector.class_embed.5.weight", "detector.class_embed.5.bias", "detector.class_embed.6.weight", "detector.class_embed.6.bias", "detector.bbox_embed.0.layers.0.weight", "detector.bbox_embed.0.layers.0.bias", "detector.bbox_embed.0.layers.1.weight", "detector.bbox_embed.0.layers.1.bias", "detector.bbox_embed.0.layers.2.weight", "detector.bbox_embed.0.layers.2.bias", "detector.bbox_embed.1.layers.0.weight", "detector.bbox_embed.1.layers.0.bias", "detector.bbox_embed.1.layers.1.weight", "detector.bbox_embed.1.layers.1.bias", "detector.bbox_embed.1.layers.2.weight", "detector.bbox_embed.1.layers.2.bias", "detector.bbox_embed.2.layers.0.weight", "detector.bbox_embed.2.layers.0.bias", "detector.bbox_embed.2.layers.1.weight", "detector.bbox_embed.2.layers.1.bias", "detector.bbox_embed.2.layers.2.weight", "detector.bbox_embed.2.layers.2.bias", "detector.bbox_embed.3.layers.0.weight", "detector.bbox_embed.3.layers.0.bias", "detector.bbox_embed.3.layers.1.weight", "detector.bbox_embed.3.layers.1.bias", "detector.bbox_embed.3.layers.2.weight", "detector.bbox_embed.3.layers.2.bias", "detector.bbox_embed.4.layers.0.weight", "detector.bbox_embed.4.layers.0.bias", "detector.bbox_embed.4.layers.1.weight", "detector.bbox_embed.4.layers.1.bias", "detector.bbox_embed.4.layers.2.weight", "detector.bbox_embed.4.layers.2.bias", "detector.bbox_embed.5.layers.0.weight", "detector.bbox_embed.5.layers.0.bias", "detector.bbox_embed.5.layers.1.weight", "detector.bbox_embed.5.layers.1.bias", "detector.bbox_embed.5.layers.2.weight", "detector.bbox_embed.5.layers.2.bias", "detector.bbox_embed.6.layers.0.weight", "detector.bbox_embed.6.layers.0.bias", "detector.bbox_embed.6.layers.1.weight", "detector.bbox_embed.6.layers.1.bias", "detector.bbox_embed.6.layers.2.weight", "detector.bbox_embed.6.layers.2.bias", "detector.query_embed.weight", "detector.input_proj.0.0.weight", "detector.input_proj.0.0.bias", "detector.input_proj.0.1.weight", "detector.input_proj.0.1.bias", "detector.input_proj.1.0.weight", "detector.input_proj.1.0.bias", "detector.input_proj.1.1.weight", "detector.input_proj.1.1.bias", "detector.input_proj.2.0.weight", "detector.input_proj.2.0.bias", "detector.input_proj.2.1.weight", "detector.input_proj.2.1.bias", "detector.input_proj.3.0.weight", "detector.input_proj.3.0.bias", "detector.input_proj.3.1.weight", "detector.input_proj.3.1.bias", "detector.backbone.0.body.patch_embed.proj.weight", "detector.backbone.0.body.patch_embed.proj.bias", "detector.backbone.0.body.patch_embed.norm.weight", "detector.backbone.0.body.patch_embed.norm.bias", "detector.backbone.0.body.layers.0.blocks.0.norm1.weight", "detector.backbone.0.body.layers.0.blocks.0.norm1.bias", "detector.backbone.0.body.layers.0.blocks.0.attn.relative_position_bias_table", "detector.backbone.0.body.layers.0.blocks.0.attn.relative_position_index", "detector.backbone.0.body.layers.0.blocks.0.attn.qkv.weight", "detector.backbone.0.body.layers.0.blocks.0.attn.qkv.bias", "detector.backbone.0.body.layers.0.blocks.0.attn.proj.weight", "detector.backbone.0.body.layers.0.blocks.0.attn.proj.bias", "detector.backbone.0.body.layers.0.blocks.0.norm2.weight", "detector.backbone.0.body.layers.0.blocks.0.norm2.bias", "detector.backbone.0.body.layers.0.blocks.0.mlp.fc1.weight", "detector.backbone.0.body.layers.0.blocks.0.mlp.fc1.bias", "detector.backbone.0.body.layers.0.blocks.0.mlp.fc2.weight", "detector.backbone.0.body.layers.0.blocks.0.mlp.fc2.bias", "detector.backbone.0.body.layers.0.blocks.1.norm1.weight", "detector.backbone.0.body.layers.0.blocks.1.norm1.bias", "detector.backbone.0.body.layers.0.blocks.1.attn.relative_position_bias_table", "detector.backbone.0.body.layers.0.blocks.1.attn.relative_position_index", "detector.backbone.0.body.layers.0.blocks.1.attn.qkv.weight", "detector.backbone.0.body.layers.0.blocks.1.attn.qkv.bias", "detector.backbone.0.body.layers.0.blocks.1.attn.proj.weight", "detector.backbone.0.body.layers.0.blocks.1.attn.proj.bias", "detector.backbone.0.body.layers.0.blocks.1.norm2.weight", "detector.backbone.0.body.layers.0.blocks.1.norm2.bias", "detector.backbone.0.body.layers.0.blocks.1.mlp.fc1.weight", "detector.backbone.0.body.layers.0.blocks.1.mlp.fc1.bias", "detector.backbone.0.body.layers.0.blocks.1.mlp.fc2.weight", "detector.backbone.0.body.layers.0.blocks.1.mlp.fc2.bias", "detector.backbone.0.body.layers.0.downsample.reduction.weight", "detector.backbone.0.body.layers.0.downsample.norm.weight", "detector.backbone.0.body.layers.0.downsample.norm.bias", "detector.backbone.0.body.layers.1.blocks.0.norm1.weight", "detector.backbone.0.body.layers.1.blocks.0.norm1.bias", "detector.backbone.0.body.layers.1.blocks.0.attn.relative_position_bias_table", "detector.backbone.0.body.layers.1.blocks.0.attn.relative_position_index", "detector.backbone.0.body.layers.1.blocks.0.attn.qkv.weight", "detector.backbone.0.body.layers.1.blocks.0.attn.qkv.bias", "detector.backbone.0.body.layers.1.blocks.0.attn.proj.weight", "detector.backbone.0.body.layers.1.blocks.0.attn.proj.bias", "detector.backbone.0.body.layers.1.blocks.0.norm2.weight", "detector.backbone.0.body.layers.1.blocks.0.norm2.bias", "detector.backbone.0.body.layers.1.blocks.0.mlp.fc1.weight", "detector.backbone.0.body.layers.1.blocks.0.mlp.fc1.bias", "detector.backbone.0.body.layers.1.blocks.0.mlp.fc2.weight", "detector.backbone.0.body.layers.1.blocks.0.mlp.fc2.bias", "detector.backbone.0.body.layers.1.blocks.1.norm1.weight", "detector.backbone.0.body.layers.1.blocks.1.norm1.bias", "detector.backbone.0.body.layers.1.blocks.1.attn.relative_position_bias_table", "detector.backbone.0.body.layers.1.blocks.1.attn.relative_position_index", "detector.backbone.0.body.layers.1.blocks.1.attn.qkv.weight", "detector.backbone.0.body.layers.1.blocks.1.attn.qkv.bias", "detector.backbone.0.body.layers.1.blocks.1.attn.proj.weight", "detector.backbone.0.body.layers.1.blocks.1.attn.proj.bias", "detector.backbone.0.body.layers.1.blocks.1.norm2.weight", "detector.backbone.0.body.layers.1.blocks.1.norm2.bias", "detector.backbone.0.body.layers.1.blocks.1.mlp.fc1.weight", "detector.backbone.0.body.layers.1.blocks.1.mlp.fc1.bias", "detector.backbone.0.body.layers.1.blocks.1.mlp.fc2.weight", "detector.backbone.0.body.layers.1.blocks.1.mlp.fc2.bias", "detector.backbone.0.body.layers.1.downsample.reduction.weight", "detector.backbone.0.body.layers.1.downsample.norm.weight", "detector.backbone.0.body.layers.1.downsample.norm.bias", "detector.backbone.0.body.layers.2.blocks.0.norm1.weight", "detector.backbone.0.body.layers.2.blocks.0.norm1.bias", "detector.backbone.0.body.layers.2.blocks.0.attn.relative_position_bias_table", "detector.backbone.0.body.layers.2.blocks.0.attn.relative_position_index", "detector.backbone.0.body.layers.2.blocks.0.attn.qkv.weight", "detector.backbone.0.body.layers.2.blocks.0.attn.qkv.bias", "detector.backbone.0.body.layers.2.blocks.0.attn.proj.weight", "detector.backbone.0.body.layers.2.blocks.0.attn.proj.bias", "detector.backbone.0.body.layers.2.blocks.0.norm2.weight", "detector.backbone.0.body.layers.2.blocks.0.norm2.bias", "detector.backbone.0.body.layers.2.blocks.0.mlp.fc1.weight", "detector.backbone.0.body.layers.2.blocks.0.mlp.fc1.bias", "detector.backbone.0.body.layers.2.blocks.0.mlp.fc2.weight", "detector.backbone.0.body.layers.2.blocks.0.mlp.fc2.bias", "detector.backbone.0.body.layers.2.blocks.1.norm1.weight", "detector.backbone.0.body.layers.2.blocks.1.norm1.bias", "detector.backbone.0.body.layers.2.blocks.1.attn.relative_position_bias_table", "detector.backbone.0.body.layers.2.blocks.1.attn.relative_position_index", "detector.backbone.0.body.layers.2.blocks.1.attn.qkv.weight", "detector.backbone.0.body.layers.2.blocks.1.attn.qkv.bias", "detector.backbone.0.body.layers.2.blocks.1.attn.proj.weight", "detector.backbone.0.body.layers.2.blocks.1.attn.proj.bias", "detector.backbone.0.body.layers.2.blocks.1.norm2.weight", "detector.backbone.0.body.layers.2.blocks.1.norm2.bias", "detector.backbone.0.body.layers.2.blocks.1.mlp.fc1.weight", "detector.backbone.0.body.layers.2.blocks.1.mlp.fc1.bias", "detector.backbone.0.body.layers.2.blocks.1.mlp.fc2.weight", "detector.backbone.0.body.layers.2.blocks.1.mlp.fc2.bias", "detector.backbone.0.body.layers.2.blocks.2.norm1.weight", "detector.backbone.0.body.layers.2.blocks.2.norm1.bias", "detector.backbone.0.body.layers.2.blocks.2.attn.relative_position_bias_table", "detector.backbone.0.body.layers.2.blocks.2.attn.relative_position_index", "detector.backbone.0.body.layers.2.blocks.2.attn.qkv.weight", "detector.backbone.0.body.layers.2.blocks.2.attn.qkv.bias", "detector.backbone.0.body.layers.2.blocks.2.attn.proj.weight", "detector.backbone.0.body.layers.2.blocks.2.attn.proj.bias", "detector.backbone.0.body.layers.2.blocks.2.norm2.weight", "detector.backbone.0.body.layers.2.blocks.2.norm2.bias", "detector.backbone.0.body.layers.2.blocks.2.mlp.fc1.weight", "detector.backbone.0.body.layers.2.blocks.2.mlp.fc1.bias", "detector.backbone.0.body.layers.2.blocks.2.mlp.fc2.weight", "detector.backbone.0.body.layers.2.blocks.2.mlp.fc2.bias", "detector.backbone.0.body.layers.2.blocks.3.norm1.weight", "detector.backbone.0.body.layers.2.blocks.3.norm1.bias", "detector.backbone.0.body.layers.2.blocks.3.attn.relative_position_bias_table", "detector.backbone.0.body.layers.2.blocks.3.attn.relative_position_index", "detector.backbone.0.body.layers.2.blocks.3.attn.qkv.weight", "detector.backbone.0.body.layers.2.blocks.3.attn.qkv.bias", "detector.backbone.0.body.layers.2.blocks.3.attn.proj.weight", "detector.backbone.0.body.layers.2.blocks.3.attn.proj.bias", "detector.backbone.0.body.layers.2.blocks.3.norm2.weight", "detector.backbone.0.body.layers.2.blocks.3.norm2.bias", "detector.backbone.0.body.layers.2.blocks.3.mlp.fc1.weight", "detector.backbone.0.body.layers.2.blocks.3.mlp.fc1.bias", "detector.backbone.0.body.layers.2.blocks.3.mlp.fc2.weight", "detector.backbone.0.body.layers.2.blocks.3.mlp.fc2.bias", "detector.backbone.0.body.layers.2.blocks.4.norm1.weight", "detector.backbone.0.body.layers.2.blocks.4.norm1.bias", "detector.backbone.0.body.layers.2.blocks.4.attn.relative_position_bias_table", "detector.backbone.0.body.layers.2.blocks.4.attn.relative_position_index", "detector.backbone.0.body.layers.2.blocks.4.attn.qkv.weight",

(...)

Unexpected key(s) in state_dict: "transformer.level_embed", "transformer.encoder.layers.0.self_attn.sampling_offsets.weight", "transformer.encoder.layers.0.self_attn.sampling_offsets.bias", "transformer.encoder.layers.0.self_attn.attention_weights.weight", "transformer.encoder.layers.0.self_attn.attention_weights.bias", "transformer.encoder.layers.0.self_attn.value_proj.weight", "transformer.encoder.layers.0.self_attn.value_proj.bias", "transformer.encoder.layers.0.self_attn.output_proj.weight", "transformer.encoder.layers.0.self_attn.output_proj.bias", "transformer.encoder.layers.0.norm1.weight", "transformer.encoder.layers.0.norm1.bias", "transformer.encoder.layers.0.linear1.weight", "transformer.encoder.layers.0.linear1.bias", "transformer.encoder.layers.0.linear2.weight", "transformer.encoder.layers.0.linear2.bias", "transformer.encoder.layers.0.norm2.weight", "transformer.encoder.layers.0.norm2.bias", "transformer.encoder.layers.1.self_attn.sampling_offsets.weight", "transformer.encoder.layers.1.self_attn.sampling_offsets.bias", "transformer.encoder.layers.1.self_attn.attention_weights.weight", "transformer.encoder.layers.1.self_attn.attention_weights.bias", "transformer.encoder.layers.1.self_attn.value_proj.weight", "transformer.encoder.layers.1.self_attn.value_proj.bias", "transformer.encoder.layers.1.self_attn.output_proj.weight", "transformer.encoder.layers.1.self_attn.output_proj.bias", "transformer.encoder.layers.1.norm1.weight", "transformer.encoder.layers.1.norm1.bias", "transformer.encoder.layers.1.linear1.weight", "transformer.encoder.layers.1.linear1.bias", "transformer.encoder.layers.1.linear2.weight", "transformer.encoder.layers.1.linear2.bias", "transformer.encoder.layers.1.norm2.weight", "transformer.encoder.layers.1.norm2.bias", "transformer.encoder.layers.2.self_attn.sampling_offsets.weight", "transformer.encoder.layers.2.self_attn.sampling_offsets.bias", "transformer.encoder.layers.2.self_attn.attention_weights.weight", "transformer.encoder.layers.2.self_attn.attention_weights.bias", "transformer.encoder.layers.2.self_attn.value_proj.weight", "transformer.encoder.layers.2.self_attn.value_proj.bias", "transformer.encoder.layers.2.self_attn.output_proj.weight", "transformer.encoder.layers.2.self_attn.output_proj.bias", "transformer.encoder.layers.2.norm1.weight", "transformer.encoder.layers.2.norm1.bias", "transformer.encoder.layers.2.linear1.weight", "transformer.encoder.layers.2.linear1.bias", "transformer.encoder.layers.2.linear2.weight", "transformer.encoder.layers.2.linear2.bias", "transformer.encoder.layers.2.norm2.weight", "transformer.encoder.layers.2.norm2.bias", "transformer.encoder.layers.3.self_attn.sampling_offsets.weight", "transformer.encoder.layers.3.self_attn.sampling_offsets.bias", "transformer.encoder.layers.3.self_attn.attention_weights.weight", "transformer.encoder.layers.3.self_attn.attention_weights.bias", "transformer.encoder.layers.3.self_attn.value_proj.weight", "transformer.encoder.layers.3.self_attn.value_proj.bias", "transformer.encoder.layers.3.self_attn.output_proj.weight", "transformer.encoder.layers.3.self_attn.output_proj.bias", "transformer.encoder.layers.3.norm1.weight", "transformer.encoder.layers.3.norm1.bias", "transformer.encoder.layers.3.linear1.weight", "transformer.encoder.layers.3.linear1.bias", "transformer.encoder.layers.3.linear2.weight", "transformer.encoder.layers.3.linear2.bias", "transformer.encoder.layers.3.norm2.weight", "transformer.encoder.layers.3.norm2.bias", "transformer.encoder.layers.4.self_attn.sampling_offsets.weight", "transformer.encoder.layers.4.self_attn.sampling_offsets.bias", "transformer.encoder.layers.4.self_attn.attention_weights.weight", "transformer.encoder.layers.4.self_attn.attention_weights.bias", "transformer.encoder.layers.4.self_attn.value_proj.weight", "transformer.encoder.layers.4.self_attn.value_proj.bias", "transformer.encoder.layers.4.self_attn.output_proj.weight", "transformer.encoder.layers.4.self_attn.output_proj.bias", "transformer.encoder.layers.4.norm1.weight", "transformer.encoder.layers.4.norm1.bias", "transformer.encoder.layers.4.linear1.weight", "transformer.encoder.layers.4.linear1.bias", "transformer.encoder.layers.4.linear2.weight", "transformer.encoder.layers.4.linear2.bias", "transformer.encoder.layers.4.norm2.weight", "transformer.encoder.layers.4.norm2.bias", "transformer.encoder.layers.5.self_attn.sampling_offsets.weight", "transformer.encoder.layers.5.self_attn.sampling_offsets.bias", "transformer.encoder.layers.5.self_attn.attention_weights.weight", "transformer.encoder.layers.5.self_attn.attention_weights.bias", "transformer.encoder.layers.5.self_attn.value_proj.weight", "transformer.encoder.layers.5.self_attn.value_proj.bias", "transformer.encoder.layers.5.self_attn.output_proj.weight", "transformer.encoder.layers.5.self_attn.output_proj.bias", "transformer.encoder.layers.5.norm1.weight", "transformer.encoder.layers.5.norm1.bias", "transformer.encoder.layers.5.linear1.weight", "transformer.encoder.layers.5.linear1.bias", "transformer.encoder.layers.5.linear2.weight", "transformer.encoder.layers.5.linear2.bias", "transformer.encoder.layers.5.norm2.weight", "transformer.encoder.layers.5.norm2.bias", "transformer.decoder.layers.0.cross_attn.sampling_offsets.weight", "transformer.decoder.layers.0.cross_attn.sampling_offsets.bias", "transformer.decoder.layers.0.cross_attn.attention_weights.weight", "transformer.decoder.layers.0.cross_attn.attention_weights.bias", "transformer.decoder.layers.0.cross_attn.value_proj.weight", "transformer.decoder.layers.0.cross_attn.value_proj.bias", "transformer.decoder.layers.0.cross_attn.output_proj.weight", "transformer.decoder.layers.0.cross_attn.output_proj.bias", "transformer.decoder.layers.0.norm1.weight", "transformer.decoder.layers.0.norm1.bias", "transformer.decoder.layers.0.self_attn.in_proj_weight", "transformer.decoder.layers.0.self_attn.in_proj_bias", "transformer.decoder.layers.0.self_attn.out_proj.weight", "transformer.decoder.layers.0.self_attn.out_proj.bias", "transformer.decoder.layers.0.norm2.weight", "transformer.decoder.layers.0.norm2.bias", "transformer.decoder.layers.0.linear1.weight", "transformer.decoder.layers.0.linear1.bias", "transformer.decoder.layers.0.linear2.weight", "transformer.decoder.layers.0.linear2.bias", "transformer.decoder.layers.0.norm3.weight", "transformer.decoder.layers.0.norm3.bias", "transformer.decoder.layers.1.cross_attn.sampling_offsets.weight", "transformer.decoder.layers.1.cross_attn.sampling_offsets.bias", "transformer.decoder.layers.1.cross_attn.attention_weights.weight", "transformer.decoder.layers.1.cross_attn.attention_weights.bias", "transformer.decoder.layers.1.cross_attn.value_proj.weight", "transformer.decoder.layers.1.cross_attn.value_proj.bias", "transformer.decoder.layers.1.cross_attn.output_proj.weight", "transformer.decoder.layers.1.cross_attn.output_proj.bias", "transformer.decoder.layers.1.norm1.weight", "transformer.decoder.layers.1.norm1.bias", "transformer.decoder.layers.1.self_attn.in_proj_weight", "transformer.decoder.layers.1.self_attn.in_proj_bias", "transformer.decoder.layers.1.self_attn.out_proj.weight", "transformer.decoder.layers.1.self_attn.out_proj.bias", "transformer.decoder.layers.1.norm2.weight", "transformer.decoder.layers.1.norm2.bias", "transformer.decoder.layers.1.linear1.weight", "transformer.decoder.layers.1.linear1.bias", "transformer.decoder.layers.1.linear2.weight", "transformer.decoder.layers.1.linear2.bias", "transformer.decoder.layers.1.norm3.weight", "transformer.decoder.layers.1.norm3.bias", "transformer.decoder.layers.2.cross_attn.sampling_offsets.weight", "transformer.decoder.layers.2.cross_attn.sampling_offsets.bias", "transformer.decoder.layers.2.cross_attn.attention_weights.weight", "transformer.decoder.layers.2.cross_attn.attention_weights.bias", "transformer.decoder.layers.2.cross_attn.value_proj.weight", "transformer.decoder.layers.2.cross_attn.value_proj.bias", "transformer.decoder.layers.2.cross_attn.output_proj.weight", "transformer.decoder.layers.2.cross_attn.output_proj.bias", "transformer.decoder.layers.2.norm1.weight", "transformer.decoder.layers.2.norm1.bias", "transformer.decoder.layers.2.self_attn.in_proj_weight", "transformer.decoder.layers.2.self_attn.in_proj_bias", "transformer.decoder.layers.2.self_attn.out_proj.weight", "transformer.decoder.layers.2.self_attn.out_proj.bias", "transformer.decoder.layers.2.norm2.weight", "transformer.decoder.layers.2.norm2.bias", "transformer.decoder.layers.2.linear1.weight", "transformer.decoder.layers.2.linear1.bias", "transformer.decoder.layers.2.linear2.weight", "transformer.decoder.layers.2.linear2.bias", "transformer.decoder.layers.2.norm3.weight", "transformer.decoder.layers.2.norm3.bias", "transformer.decoder.layers.3.cross_attn.sampling_offsets.weight", "transformer.decoder.layers.3.cross_attn.sampling_offsets.bias", "transformer.decoder.layers.3.cross_attn.attention_weights.weight", "transformer.decoder.layers.3.cross_attn.attention_weights.bias", "transformer.decoder.layers.3.cross_attn.value_proj.weight", "transformer.decoder.layers.3.cross_attn.value_proj.bias", "transformer.decoder.layers.3.cross_attn.output_proj.weight", "transformer.decoder.layers.3.cross_attn.output_proj.bias", "transformer.decoder.layers.3.norm1.weight", "transformer.decoder.layers.3.norm1.bias", "transformer.decoder.layers.3.self_attn.in_proj_weight", "transformer.decoder.layers.3.self_attn.in_proj_bias", "transformer.decoder.layers.3.self_attn.out_proj.weight", "transformer.decoder.layers.3.self_attn.out_proj.bias", "transformer.decoder.layers.3.norm2.weight", "transformer.decoder.layers.3.norm2.bias", "transformer.decoder.layers.3.linear1.weight", "transformer.decoder.layers.3.linear1.bias", "transformer.decoder.layers.3.linear2.weight", "transformer.decoder.layers.3.linear2.bias", "transformer.decoder.layers.3.norm3.weight", "transformer.decoder.layers.3.norm3.bias", "transformer.decoder.layers.4.cross_attn.sampling_offsets.weight", "transformer.decoder.layers.4.cross_attn.sampling_offsets.bias", "transformer.decoder.layers.4.cross_attn.attention_weights.weight", "transformer.decoder.layers.4.cross_attn.attention_weights.bias", "transformer.decoder.layers.4.cross_attn.value_proj.weight", "transformer.decoder.layers.4.cross_attn.value_proj.bias", "transformer.decoder.layers.4.cross_attn.output_proj.weight", "transformer.decoder.layers.4.cross_attn.output_proj.bias", "transformer.decoder.layers.4.norm1.weight", "transformer.decoder.layers.4.norm1.bias", "transformer.decoder.layers.4.self_attn.in_proj_weight", "transformer.decoder.layers.4.self_attn.in_proj_bias", "transformer.decoder.layers.4.self_attn.out_proj.weight", "transformer.decoder.layers.4.self_attn.out_proj.bias", "transformer.decoder.layers.4.norm2.weight", "transformer.decoder.layers.4.norm2.bias", "transformer.decoder.layers.4.linear1.weight", "transformer.decoder.layers.4.linear1.bias", "transformer.decoder.layers.4.linear2.weight", "transformer.decoder.layers.4.linear2.bias", "transformer.decoder.layers.4.norm3.weight", "transformer.decoder.layers.4.norm3.bias", "transformer.decoder.layers.5.cross_attn.sampling_offsets.weight", "transformer.decoder.layers.5.cross_attn.sampling_offsets.bias", "transformer.decoder.layers.5.cross_attn.attention_weights.weight", "transformer.decoder.layers.5.cross_attn.attention_weights.bias", "transformer.decoder.layers.5.cross_attn.value_proj.weight", "transformer.decoder.layers.5.cross_attn.value_proj.bias", "transformer.decoder.layers.5.cross_attn.output_proj.weight", "transformer.decoder.layers.5.cross_attn.output_proj.bias", "transformer.decoder.layers.5.norm1.weight", "transformer.decoder.layers.5.norm1.bias", "transformer.decoder.layers.5.self_attn.in_proj_weight", "transformer.decoder.layers.5.self_attn.in_proj_bias", "transformer.decoder.layers.5.self_attn.out_proj.weight", "transformer.decoder.layers.5.self_attn.out_proj.bias", "transformer.decoder.layers.5.norm2.weight", "transformer.decoder.layers.5.norm2.bias", "transformer.decoder.layers.5.linear1.weight", "transformer.decoder.layers.5.linear1.bias", "transformer.decoder.layers.5.linear2.weight", "transformer.decoder.layers.5.linear2.bias", "transformer.decoder.layers.5.norm3.weight", "transformer.decoder.layers.5.norm3.bias", "transformer.decoder.bbox_embed.0.layers.0.weight", "transformer.decoder.bbox_embed.0.layers.0.bias", "transformer.decoder.bbox_embed.0.layers.1.weight", "transformer.decoder.bbox_embed.0.layers.1.bias", "transformer.decoder.bbox_embed.0.layers.2.weight", "transformer.decoder.bbox_embed.0.layers.2.bias", "transformer.decoder.bbox_embed.1.layers.0.weight", "transformer.decoder.bbox_embed.1.layers.0.bias", "transformer.decoder.bbox_embed.1.layers.1.weight", "transformer.decoder.bbox_embed.1.layers.1.bias", "transformer.decoder.bbox_embed.1.layers.2.weight", "transformer.decoder.bbox_embed.1.layers.2.bias", "transformer.decoder.bbox_embed.2.layers.0.weight", "transformer.decoder.bbox_embed.2.layers.0.bias", "transformer.decoder.bbox_embed.2.layers.1.weight", "transformer.decoder.bbox_embed.2.layers.1.bias", "transformer.decoder.bbox_embed.2.layers.2.weight", "transformer.decoder.bbox_embed.2.layers.2.bias", "transformer.decoder.bbox_embed.3.layers.0.weight", "transformer.decoder.bbox_embed.3.layers.0.bias", "transformer.decoder.bbox_embed.3.layers.1.weight", "transformer.decoder.bbox_embed.3.layers.1.bias", "transformer.decoder.bbox_embed.3.layers.2.weight", "transformer.decoder.bbox_embed.3.layers.2.bias", "transformer.decoder.bbox_embed.4.layers.0.weight", "transformer.decoder.bbox_embed.4.layers.0.bias", "transformer.decoder.bbox_embed.4.layers.1.weight", "transformer.decoder.bbox_embed.4.layers.1.bias", "transformer.decoder.bbox_embed.4.layers.2.weight", "transformer.decoder.bbox_embed.4.layers.2.bias", "transformer.decoder.bbox_embed.5.layers.0.weight", "transformer.decoder.bbox_embed.5.layers.0.bias", "transformer.decoder.bbox_embed.5.layers.1.weight", "transformer.decoder.bbox_embed.5.layers.1.bias", "transformer.decoder.bbox_embed.5.layers.2.weight", "transformer.decoder.bbox_embed.5.layers.2.bias", "transformer.decoder.bbox_embed.6.layers.0.weight", "transformer.decoder.bbox_embed.6.layers.0.bias", "transformer.decoder.bbox_embed.6.layers.1.weight", "transformer.decoder.bbox_embed.6.layers.1.bias", "transformer.decoder.bbox_embed.6.layers.2.weight", "transformer.decoder.bbox_embed.6.layers.2.bias", "transformer.decoder.class_embed.0.weight", "transformer.decoder.class_embed.0.bias", "transformer.decoder.class_embed.1.weight", "transformer.decoder.class_embed.1.bias", "transformer.decoder.class_embed.2.weight", "transformer.decoder.class_embed.2.bias", "transformer.decoder.class_embed.3.weight", "transformer.decoder.class_embed.3.bias", "transformer.decoder.class_embed.4.weight", "transformer.decoder.class_embed.4.bias", "transformer.decoder.class_embed.5.weight", "transformer.decoder.class_embed.5.bias", "transformer.decoder.class_embed.6.weight", "transformer.decoder.class_embed.6.bias", "transformer.enc_output.weight", "transformer.enc_output.bias", "transformer.enc_output_norm.weight", "transformer.enc_output_norm.bias", "transformer.pos_trans.weight", "transformer.pos_trans.bias", "transformer.pos_trans_norm.weight", "transformer.pos_trans_norm.bias", "class_embed.0.weight", "class_embed.0.bias", "class_embed.1.weight", "class_embed.1.bias", "class_embed.2.weight", "class_embed.2.bias", "class_embed.3.weight", "class_embed.3.bias", "class_embed.4.weight", "class_embed.4.bias", "class_embed.5.weight", "class_embed.5.bias", "class_embed.6.weight", "class_embed.6.bias", "bbox_embed.0.layers.0.weight", "bbox_embed.0.layers.0.bias", "bbox_embed.0.layers.1.weight", "bbox_embed.0.layers.1.bias", "bbox_embed.0.layers.2.weight", "bbox_embed.0.layers.2.bias", "bbox_embed.1.layers.0.weight", "bbox_embed.1.layers.0.bias", "bbox_embed.1.layers.1.weight", "bbox_embed.1.layers.1.bias", "bbox_embed.1.layers.2.weight", "bbox_embed.1.layers.2.bias", "bbox_embed.2.layers.0.weight", "bbox_embed.2.layers.0.bias", "bbox_embed.2.layers.1.weight", "bbox_embed.2.layers.1.bias", "bbox_embed.2.layers.2.weight", "bbox_embed.2.layers.2.bias", "bbox_embed.3.layers.0.weight", "bbox_embed.3.layers.0.bias", "bbox_embed.3.layers.1.weight", "bbox_embed.3.layers.1.bias", "bbox_embed.3.layers.2.weight", "bbox_embed.3.layers.2.bias", "bbox_embed.4.layers.0.weight", "bbox_embed.4.layers.0.bias", "bbox_embed.4.layers.1.weight", "bbox_embed.4.layers.1.bias", "bbox_embed.4.layers.2.weight", "bbox_embed.4.layers.2.bias", "bbox_embed.5.layers.0.weight", "bbox_embed.5.layers.0.bias", "bbox_embed.5.layers.1.weight", "bbox_embed.5.layers.1.bias", "bbox_embed.5.layers.2.weight", "bbox_embed.5.layers.2.bias", "bbox_embed.6.layers.0.weight", "bbox_embed.6.layers.0.bias", "bbox_embed.6.layers.1.weight", "bbox_embed.6.layers.1.bias", "bbox_embed.6.layers.2.weight", "bbox_embed.6.layers.2.bias", "query_embed.weight", "input_proj.0.0.weight", "input_proj.0.0.bias", "input_proj.0.1.weight", "input_proj.0.1.bias", "input_proj.1.0.weight", "input_proj.1.0.bias", "input_proj.1.1.weight", "input_proj.1.1.bias", "input_proj.2.0.weight", "input_proj.2.0.bias", "input_proj.2.1.weight", "input_proj.2.1.bias", "input_proj.3.0.weight", "input_proj.3.0.bias", "input_proj.3.1.weight", "input_proj.3.1.bias", "backbone.0.body.patch_embed.proj.weight", "backbone.0.body.patch_embed.proj.bias", "backbone.0.body.patch_embed.norm.weight", "backbone.0.body.patch_embed.norm.bias", "backbone.0.body.layers.0.blocks.0.norm1.weight", "backbone.0.body.layers.0.blocks.0.norm1.bias", "backbone.0.body.layers.0.blocks.0.attn.relative_position_bias_table", "backbone.0.body.layers.0.blocks.0.attn.relative_position_index", "backbone.0.body.layers.0.blocks.0.attn.qkv.weight", "backbone.0.body.layers.0.blocks.0.attn.qkv.bias", "backbone.0.body.layers.0.blocks.0.attn.proj.weight", "backbone.0.body.layers.0.blocks.0.attn.proj.bias", "backbone.0.body.layers.0.blocks.0.norm2.weight", "backbone.0.body.layers.0.blocks.0.norm2.bias", "backbone.0.body.layers.0.blocks.0.mlp.fc1.weight", "backbone.0.body.layers.0.blocks.0.mlp.fc1.bias", "backbone.0.body.layers.0.blocks.0.mlp.fc2.weight", "backbone.0.body.layers.0.blocks.0.mlp.fc2.bias", "backbone.0.body.layers.0.blocks.1.norm1.weight", "backbone.0.body.layers.0.blocks.1.norm1.bias", "backbone.0.body.layers.0.blocks.1.attn.relative_position_bias_table", "backbone.0.body.layers.0.blocks.1.attn.relative_position_index", "backbone.0.body.layers.0.blocks.1.attn.qkv.weight", "backbone.0.body.layers.0.blocks.1.attn.qkv.bias", "backbone.0.body.layers.0.blocks.1.attn.proj.weight", "backbone.0.body.layers.0.blocks.1.attn.proj.bias", "backbone.0.body.layers.0.blocks.1.norm2.weight", "backbone.0.body.layers.0.blocks.1.norm2.bias", "backbone.0.body.layers.0.blocks.1.mlp.fc1.weight", "backbone.0.body.layers.0.blocks.1.mlp.fc1.bias", "backbone.0.body.layers.0.blocks.1.mlp.fc2.weight", "backbone.0.body.layers.0.blocks.1.mlp.fc2.bias", "backbone.0.body.layers.0.downsample.reduction.weight", "backbone.0.body.layers.0.downsample.norm.weight", "backbone.0.body.layers.0.downsample.norm.bias", "backbone.0.body.layers.1.blocks.0.norm1.weight", "backbone.0.body.layers.1.blocks.0.norm1.bias", "backbone.0.body.layers.1.blocks.0.attn.relative_position_bias_table", "backbone.0.body.layers.1.blocks.0.attn.relative_position_index", "backbone.0.body.layers.1.blocks.0.attn.qkv.weight", "backbone.0.body.layers.1.blocks.0.attn.qkv.bias", "backbone.0.body.layers.1.blocks.0.attn.proj.weight", "backbone.0.body.layers.1.blocks.0.attn.proj.bias", "backbone.0.body.layers.1.blocks.0.norm2.weight", "backbone.0.body.layers.1.blocks.0.norm2.bias", "backbone.0.body.layers.1.blocks.0.mlp.fc1.weight", "backbone.0.body.layers.1.blocks.0.mlp.fc1.bias", "backbone.0.body.layers.1.blocks.0.mlp.fc2.weight", "backbone.0.body.layers.1.blocks.0.mlp.fc2.bias", "backbone.0.body.layers.1.blocks.1.norm1.weight", "backbone.0.body.layers.1.blocks.1.norm1.bias", "backbone.0.body.layers.1.blocks.1.attn.relative_position_bias_table", "backbone.0.body.layers.1.blocks.1.attn.relative_position_index", "backbone.0.body.layers.1.blocks.1.attn.qkv.weight", "backbone.0.body.layers.1.blocks.1.attn.qkv.bias", "backbone.0.body.layers.1.blocks.1.attn.proj.weight", "backbone.0.body.layers.1.blocks.1.attn.proj.bias", "backbone.0.body.layers.1.blocks.1.norm2.weight", "backbone.0.body.layers.1.blocks.1.norm2.bias", "backbone.0.body.layers.1.blocks.1.mlp.fc1.weight", "backbone.0.body.layers.1.blocks.1.mlp.fc1.bias", "backbone.0.body.layers.1.blocks.1.mlp.fc2.weight", "backbone.0.body.layers.1.blocks.1.mlp.fc2.bias", "backbone.0.body.layers.1.downsample.reduction.weight", "backbone.0.body.layers.1.downsample.norm.weight", "backbone.0.body.layers.1.downsample.norm.bias", "backbone.0.body.layers.2.blocks.0.norm1.weight", "backbone.0.body.layers.2.blocks.0.norm1.bias", "backbone.0.body.layers.2.blocks.0.attn.relative_position_bias_table", "backbone.0.body.layers.2.blocks.0.attn.relative_position_index", "backbone.0.body.layers.2.blocks.0.attn.qkv.weight", "backbone.0.body.layers.2.blocks.0.attn.qkv.bias", "backbone.0.body.layers.2.blocks.0.attn.proj.weight", "backbone.0.body.layers.2.blocks.0.attn.proj.bias", "backbone.0.body.layers.2.blocks.0.norm2.weight", "backbone.0.body.layers.2.blocks.0.norm2.bias", "backbone.0.body.layers.2.blocks.0.mlp.fc1.weight", "backbone.0.body.layers.2.blocks.0.mlp.fc1.bias", "backbone.0.body.layers.2.blocks.0.mlp.fc2.weight", "backbone.0.body.layers.2.blocks.0.mlp.fc2.bias", "backbone.0.body.layers.2.blocks.1.norm1.weight", "backbone.0.body.layers.2.blocks.1.norm1.bias", "backbone.0.body.layers.2.blocks.1.attn.relative_position_bias_table", "backbone.0.body.layers.2.blocks.1.attn.relative_position_index", "backbone.0.body.layers.2.blocks.1.attn.qkv.weight", "backbone.0.body.layers.2.blocks.1.attn.qkv.bias", "backbone.0.body.layers.2.blocks.1.attn.proj.weight", "backbone.0.body.layers.2.blocks.1.attn.proj.bias", "backbone.0.body.layers.2.blocks.1.norm2.weight", "backbone.0.body.layers.2.blocks.1.norm2.bias", "backbone.0.body.layers.2.blocks.1.mlp.fc1.weight", "backbone.0.body.layers.2.blocks.1.mlp.fc1.bias", "backbone.0.body.layers.2.blocks.1.mlp.fc2.weight", "backbone.0.body.layers.2.blocks.1.mlp.fc2.bias", "backbone.0.body.layers.2.blocks.2.norm1.weight", "backbone.0.body.layers.2.blocks.2.norm1.bias", "backbone.0.body.layers.2.blocks.2.attn.relative_position_bias_table", "backbone.0.body.layers.2.blocks.2.attn.relative_position_index", "backbone.0.body.layers.2.blocks.2.attn.qkv.weight", "backbone.0.body.layers.2.blocks.2.attn.qkv.bias", "backbone.0.body.layers.2.blocks.2.attn.proj.weight", "backbone.0.body.layers.2.blocks.2.attn.proj.bias", "backbone.0.body.layers.2.blocks.2.norm2.weight", "backbone.0.body.layers.2.blocks.2.norm2.bias", "backbone.0.body.layers.2.blocks.2.mlp.fc1.weight", "backbone.0.body.layers.2.blocks.2.mlp.fc1.bias", "backbone.0.body.layers.2.blocks.2.mlp.fc2.weight", "backbone.0.body.layers.2.blocks.2.mlp.fc2.bias", "backbone.0.body.layers.2.blocks.3.norm1.weight", "backbone.0.body.layers.2.blocks.3.norm1.bias", "backbone.0.body.layers.2.blocks.3.attn.relative_position_bias_table", "backbone.0.body.layers.2.blocks.3.attn.relative_position_index", "backbone.0.body.layers.2.blocks.3.attn.qkv.weight", "backbone.0.body.layers.2.blocks.3.attn.qkv.bias", "backbone.0.body.layers.2.blocks.3.attn.proj.weight", "backbone.0.body.layers.2.blocks.3.attn.proj.bias", "backbone.0.body.layers.2.blocks.3.norm2.weight", "backbone.0.body.layers.2.blocks.3.norm2.bias", "backbone.0.body.layers.2.blocks.3.mlp.fc1.weight", "backbone.0.body.layers.2.blocks.3.mlp.fc1.bias", "backbone.0.body.layers.2.blocks.3.mlp.fc2.weight", "backbone.0.body.layers.2.blocks.3.mlp.fc2.bias", "backbone.0.body.layers.2.blocks.4.norm1.weight", "backbone.0.body.layers.2.blocks.4.norm1.bias", "backbone.0.body.layers.2.blocks.4.attn.relative_position_bias_table", "backbone.0.body.layers.2.blocks.4.attn.relative_position_index", "backbone.0.body.layers.2.blocks.4.attn.qkv.weight", "backbone.0.body.layers.2.blocks.4.attn.qkv.bias", (...)

fredzzhang commented 11 months ago

Hi @QihanZhao,

The weights you loaded are only for the object detector, not the HOI detection model. The --resume flag takes a trained PViC model, which includes both the detector weights and the downstream interaction head weights. The object detector weights should be loaded with the --pretrained flag.

You could train the model with the following command

DETR=advanced python main.py --backbone swin_large \
                             --drop-path-rate 0.5 \
                             --num-queries-one2one 900 \
                             --num-queries-one2many 1500 \
                             --pretrained checkpoints/h-defm-detr-swinL-dp0-mqs-lft-iter-2stg-hicodet.pth \
                             --use-checkpoint \
                             --output-dir outputs/pvic-h-defm-detr-swinL-hicodet

Fred.

fredzzhang commented 11 months ago

Hi @QihanZhao,

Were you able to reproduce the results? I forgot to add the flag --use-checkpoint in the original documentation, which I have fixed now. The flag uses memory checkpointing to significantly reduce memory consumption. You should be able to fit the model with ease.

Let me know if you have issues with the reproduction.

Fred.

QihanZhao commented 11 months ago

Dear Dr. Zhang, @fredzzhang

Thank you for your prompt and informative response. I appreciate the detailed instructions on how to train the PViC model using the object detector weights with the --pretrained flag.

However, my current objective is to evaluate the performance of the model through testing, not training. I would like to confirm my understanding based on your response: the fully trained PViC model parameters, which include both the detector weights and the downstream interaction head weights, have not been made available. Is it correct to assume that only the backbone's pretrained weights are provided, and not the complete set of trained parameters for the entire PViC model?

If my understanding is correct, would it be possible for you to provide the full weights for the PViC model so that we can proceed with the testing phase? Having access to the pretrained model parameters would enable us to evaluate the model's performance accurately and contribute further to the community's understanding of its capabilities.

I apologize for any confusion and thank you once again for your assistance.

Looking forward to your clarification.

Best regards, Qihan

fredzzhang commented 11 months ago

As we tested the model with four different object detectors, storage of checkpoints was a slight issue. For that reason, only the PViC-DETR-R50 checkpoint was kept. You can download the weights from a link in the inference section of the docs.

Fred.

QihanZhao commented 11 months ago

ok, I am training it now. Thanks for help. Feedback later

fredzzhang commented 9 months ago

Since there haven't been any reported issues. I'm assuming it is working as intended.