Loading backbone to OpenPCDet

baraujo98 commented 3 years ago

Hi! I tried to load a checkpoint from the pretrained backbones to PointRCNN. In this case, I tried to pick up from epoch 50 of the pretraining, just to test.

Here is the command:

python -m torch.distributed.launch --nproc_per_node=1 train.py --launcher pytorch --cfg_file cfgs/kitti_models/pointrcnn_iou_finetune.yaml --pretrained_model /home/baraujo/DepthContrast/third_party/OpenPCDet/checkpoints/checkpoint-ep50.pth.tar

Here is the error:

Traceback (most recent call last):
  File "train.py", line 205, in <module>
    main()
  File "train.py", line 132, in main
    init_model_from_weights(model, state, freeze_bb=False)
  File "/home/baraujo/DepthContrast/third_party/OpenPCDet/tools/checkpoint.py", line 59, in init_model_from_weights
    assert (
AssertionError: Unknown state dict key: classy_state_dict

I don't totally understand some operation in the beggining of init_model_from_weights(). The checkpoints I got from the pretraining only have this keys: dict_keys(['epoch', 'model', 'optimizer', 'train_criterion']), they don't have a "classy_state_dict" or a "base_model" key, like I think you expect.

Thanks!

zaiweizhang commented 3 years ago

Yeah. The naming might need to be changed a bit. I think you need to use "model" instead of "classy_state_dict".

baraujo98 commented 3 years ago

Ok, understood @zaiweizhang . Should I change anything in the following if statements?

I guess I should just do something along this lines:

classy_state_dict = state_dict["model"]
state_dict = {}
state_dict.update(classy_state_dict)

zaiweizhang commented 3 years ago

Yeah. You probably need to look up variable names in the checkpoint and change some string names in that function. It should be a trivial task.

baraujo98 commented 3 years ago

Got it. I solved the problem by:

Sending state['model'] instead of state, as the 2nd argument of the function, in train.py
Commenting lines 57-70
Replacing "trunk.base_model.0" and "trunk.base_model.2" by "module.trunk.0" and "module.trunk.2"

I noticed you defined a freeze_bb argument. Did you do any freezing in your fintuning tests? Seem like a good idea, at least for the first epochs.

zaiweizhang commented 3 years ago

I usually do not use that flag. I only freeze the weights for ModelNet shape classification. You should try to finetune on all weights first. I find freezing the weights sometimes causes performance decreases.

baraujo98 commented 3 years ago

Ok, that's an interesting finding: a bit counter-intuitive, I would say. Did you freeze the backbone only on the first epochs, in your tests?

zaiweizhang commented 3 years ago

No. I did not freeze any weight. I load the pretrained weight and then finetune on all weights but I did increase the learning rate two times higher.

baraujo98 commented 3 years ago

Ok, thanks, but when when (if) you tried freezing the backbone, was it frozen during the whole train, or only during the first epochs? So I know if its worth trying to just freeze in the first epochs.

zaiweizhang commented 3 years ago

I was freezing it during the whole train. So it's probably worth it trying to just freeze in the first epochs.

baraujo98 commented 3 years ago

Ok, will try! Closing the issue for now. Thank you very much @zaiweizhang for the help :grin:

baraujo98 commented 3 years ago

Should this be enough to freeze the backbone? model.backbone_3d.requires_grad_(requires_grad=False) Or should I do anything more sophisticated like filtering the layers to send to the optimizer, or use with torch.no_grad().

And the opposite to unfreeze, once the first epochs are done: model.backbone_3d.requires_grad_(requires_grad=True)

I tried, and it looks like it might have worked.

zaiweizhang commented 3 years ago

Yep. That should work.

facebookresearch / DepthContrast

Loading backbone to OpenPCDet #13