RozDavid / LanguageGroundedSemseg

Implementation for ECCV 2022 paper Language-Grounded Indoor 3D Semantic Segmentation in the Wild
98 stars 14 forks source link

Can't Test the Provided Pretrained Model Checkpoints #18

Closed mmaaz60 closed 1 year ago

mmaaz60 commented 1 year ago

Hi @RozDavid,

Thank you for sharing the great work. I am trying to evaluate the provided pretrained model Res16UNet34D-pretrain and it looks like the keys of the checkpoints provided and the model definition in the repo doesn't match.

Missing key(s) in state_dict: "embedding_criterion.projection_model.attr_linears.0.weight", 
"embedding_criterion.projection_model.attr_linears.0.bias", "embedding_criterion.projection_model.attr_linears.1.weight",
"embedding_criterion.projection_model.attr_linears.1.bias", "embedding_criterion.projection_model.attr_linears.2.weight", 
"embedding_criterion.projection_model.attr_linears.2.bias", "embedding_criterion.projection_model.attr_linears.3.weight",
"embedding_criterion.projection_model.attr_linears.3.bias", "embedding_criterion.projection_model.attr_linears.4.weight", 
"embedding_criterion.projection_model.attr_linears.4.bias", "embedding_criterion.projection_model.attr_linears.5.weight", 
"embedding_criterion.projection_model.attr_linears.5.bias", "embedding_criterion.projection_model.attr_linears.6.weight",
"embedding_criterion.projection_model.attr_linears.6.bias", "embedding_criterion.projection_model.attr_linears.7.weight", 
"embedding_criterion.projection_model.attr_linears.7.bias".

Following is the parameter table that I got for your reference. It looks like the pretrained model does not contain weights corresponding to embedding_criterion. image

I am using the script text_representation_train.sh and setting is_train to false and setting the resume to the downloaded checkpoint directory.

However, I could successfully reproduce the results of the fine tuning stage Res16UNet34D-finetune.

I would appreciate any help. Thanks

RozDavid commented 1 year ago

Hey @mmaaz60,

Thanks for reaching out! Yeah it is indeed an issue from my side, sorry about that, haven't checked carefully enough if everything is possible to run out of the box in the public repo. The problem you are facing is an ablation study we were doing, but didn't include in he paper and the public code. Basically, the idea was to add natural language based augmentations on the instances, but it didn't make a difference. The error you are facing tries to load these augmentation parameters into the language criterion (even though they were not used during the pretraining). To be honest I never resumed a pretraining, checkpoint, only loaded them as weights into the finetuning stage.

What you could do is to remove every weight from the checkpoint which starts with embedding_criterion.projection_model and resume the training after. Or alternatively you could set the PL load_checkpoint function to strict=False. Sadly I don't have the time to test these myself right now, but I will also try to update the checkpoint next week some time.

Hope this helps, but let me know if you still have problems after trying these.

Cheers, David

mmaaz60 commented 1 year ago

Thank You @RozDavid,

I have a few more questions/requests.

  1. Can you share the complete log files of the pretraining, corresponding to the provided Res16UNet34D-pretrain weights?
  2. From the issue, I understand that the CLIP-only results in the paper are obtained by not using the class-balanced focal loss or the instance sampling during pretraining. Is it possible if you can share the CLIP-only weights as well or guide about the parameters/setting that I should use to reproduce the results?

Thank You

RozDavid commented 1 year ago

Hey @mmaaz60,

1) I am not sure I have the log files anymore, so sadly I can't help with that 2) During pretraining we don't do any kind of instance augmentation, nor focal loss. That stage is only supervised with the ContrastiveLanguageLoss and standard geometric/color space augmentations. Clip-only refers to the finetuning stage, which you an start after loading your pretrained model or the one we shared. To do this you have to set --loss_type cross_entropy and --sample_tail_instances False, while keeping --use_embedding_loss None