VinAIResearch / ISBNet

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution (CVPR 2023)
Apache License 2.0
104 stars 22 forks source link

[pretrained scannet_v2_lightweight] size mismatch: inst_mask_head.2.weight, inst_mask_head.2.bias #35

Closed YuQiao0303 closed 11 months ago

YuQiao0303 commented 11 months ago

Hi authors! Thank you for releasing this amazing work!

What I did

I'm trying to use the pretrained weight for a quick demo of ScanNet val (lightweight). I've downloaded the pretrained checkpoint, and run the following command:

python3 tools/test.py configs/scannetv2/isbnet_lightweight_scannetv2.yaml dataset/pretrained/head_small_scannetv2_val.pth --out out/scannetv2_lightweight

(for a quick demo, I only load 3 scans instead of all 312 scans in the validation set)

Result

output

However, here's the output:

$ python3 tools/test.py configs/scannetv2/isbnet_lightweight_scannetv2.yaml dataset/pretrained/head_small_scannetv2_val.pth --out out/scannetv2_lightweight
self.mask_dim_out 32 # this is something I added for debugging
weight_nums [1216, 512, 16] # this is something I added for debugging
bias_nums [32, 16, 1] # this is something I added for debugging
 self.num_gen_params 1793 # this is something I added for debugging
2023-08-23 16:07:43,226 - INFO - Load state dict from dataset/pretrained/head_small_scannetv2_val.pth
skip key inst_mask_head.2.weight torch.Size([1281, 64, 1]) torch.Size([1793, 64, 1]) # this is something I added for debugging
skip key inst_mask_head.2.bias torch.Size([1281]) torch.Size([1793]) # this is something I added for debugging
2023-08-23 16:07:43,363 - INFO - removed keys in source state_dict due to size mismatch: inst_mask_head.2.weight, inst_mask_head.2.bias
2023-08-23 16:07:43,364 - INFO - missing keys in source state_dict: inst_mask_head.2.weight, inst_mask_head.2.bias
2023-08-23 16:07:43,364 - INFO - unexpected key in source state_dict: criterion.empty_weight
2023-08-23 16:07:43,366 - INFO - Load test dataset: 3 scans
2023-08-23 16:07:47,343 - INFO - Infer scene 1/3
2023-08-23 16:07:48,432 - INFO - Evaluate instance segmentation

################################################################
what           :      AP  AP_50%  AP_25%      AR  RC_50%  RC_25%
################################################################
cabinet        :   0.000   0.000   0.000   0.000   0.000   0.000
bed            :     nan     nan     nan     nan     nan     nan
chair          :   0.000   0.000   0.000   0.000   0.000   0.000
sofa           :     nan     nan     nan     nan     nan     nan
table          :   0.000   0.000   0.125   0.000   0.000   0.125
door           :   0.000   0.000   0.000   0.000   0.000   0.000
window         :   0.000   0.000   0.000   0.000   0.000   0.000
bookshelf      :     nan     nan     nan     nan     nan     nan
picture        :     nan     nan     nan     nan     nan     nan
counter        :     nan     nan     nan     nan     nan     nan
desk           :     nan     nan     nan     nan     nan     nan
curtain        :     nan     nan     nan     nan     nan     nan
refrigerator   :     nan     nan     nan     nan     nan     nan
shower curtain :     nan     nan     nan     nan     nan     nan
toilet         :   0.000   0.000   0.000   0.000   0.000   0.000
sink           :   0.000   0.000   0.000   0.000   0.000   0.000
bathtub        :   0.000   0.000   0.000   0.000   0.000   0.000
otherfurniture :   0.000   0.000   0.000   0.000   0.000   0.000
----------------------------------------------------------------
average        :   0.000   0.000   0.014   0.000   0.000   0.014
################################################################

2023-08-23 16:07:50,481 - INFO - Average run time: 1.3831
2023-08-23 16:07:50,481 - INFO - Save results

visualization

And here's an visualization result: image

analyze

I suppose the wrong result is mainly due to the failure to load inst_mask_head.2.weight and inst_mask_head.2.bias. So I print some values to debug. In isbnet/model/isbnet.py, init_dyco(), I print something like:

weight_nums = [(self.mask_dim_out + 3 + 3) * self.mask_dim_out, self.mask_dim_out * (self.mask_dim_out//2), (self.mask_dim_out//2) * 1]
bias_nums = [self.mask_dim_out, (self.mask_dim_out//2), 1]

self.weight_nums = weight_nums
self.bias_nums = bias_nums
self.num_gen_params = sum(weight_nums) + sum(bias_nums) # self.num_gen_params = 1281 #

# NOTE convolution before the condinst take place (convolution num before the generated parameters take place)
inst_mask_head = [
    conv_block(self.instance_head_cfg.dec_dim, self.instance_head_cfg.dec_dim),
    conv_block(self.instance_head_cfg.dec_dim, self.instance_head_cfg.dec_dim),
]

controller_head = nn.Conv1d(self.instance_head_cfg.dec_dim, self.num_gen_params, kernel_size=1)
print('self.mask_dim_out',self.mask_dim_out)
print('weight_nums',weight_nums)
print('bias_nums',bias_nums)
print(' self.num_gen_params', self.num_gen_params)
torch.nn.init.normal_(controller_head.weight, std=0.01)
torch.nn.init.constant_(controller_head.bias, 0)
inst_mask_head.append(controller_head)
self.add_module("inst_mask_head", nn.Sequential(*inst_mask_head))

and the issue is that self.num_gen_params is 1793 instead of 1281 (see the output mentioned above).

Though 1793-1281=512, and weight_nums is of shape [1216, 512, 16], and bias_nums is of shape [32, 16, 1], I'm not sure how to fix this issue. Could you please help? Thank you!!

ngoductuanlhp commented 11 months ago

Recently we modified the number of dynamic convolution layers to 3 (instead of 2 in the lightweight version). We just updated the code to load pre-trained weight of the lightweight model. Feel free to try again.

Thank you.