alexklwong / mondi-python

PyTorch Implementation of Monitored Distillation for Positive Congruent Depth Completion (ECCV 2022)
44 stars 3 forks source link

Size inconsistency between pretrained_models and train_models. #2

Open GuoJeffrey opened 1 year ago

GuoJeffrey commented 1 year ago

Dear Prof. Wong. Thanks for providing such wonderful project! I encounted some problems when I tried to reproduce pretrained_models from scratch by myself.

I found that train_models (depth_model.pth is about 62.2 MB and pose_model.pth is about 48.2 MB) training by myself are smaller than pretrained_models (modi-kitti.pth is about 94.1 MB, posenet-kitti.pth is about 89.0 MB). After analysing these models, I realized that the network frameworks are same in these models and the only difference between pretrained_models and train_models is "optimizer_state_dict" in models.

Pretrained_models from Model Zoo: 2023-01-28 21-53-33屏幕截图

Train_models training from scratch by myself: 2023-01-28 21-53-52屏幕截图

The training script is as follow: training_script.txt

Could I trouble you to help me to understand the size inconsistency. I will appreciate any suggestions or tips on how to reproduce the pretrained_models. Kind regards.

alexklwong commented 1 year ago

Hi @GuoJeffrey, thanks for your interest in this work.

This is quite a strange issue as the optimizer dimensions should match to that of the pretrained models i.e. to show the numbers on github we re-ran all of them based on checkpoints released in the model zoo. My current guess is that the size of the optimizer dictionaries differ now due to our final code clean up that separated a (previously) single optimizer instance into two for depth and pose for more control for users. In that case, the release pretrained checkpoints would be larger.

Some questions:

  1. Are you able to load the pretrained models into the run scripts to reproduce the numbers reported on the page?
  2. Are you able to load your own trained models into the run scripts to produce some similar numbers as reported on the page?
  3. Are you able to load the pretrained model checkpoints to finetune in your own training session?
GuoJeffrey commented 1 year ago

Hi @GuoJeffrey, thanks for your interest in this work.

This is quite a strange issue as the optimizer dimensions should match to that of the pretrained models i.e. to show the numbers on github we re-ran all of them based on checkpoints released in the model zoo. My current guess is that the size of the optimizer dictionaries differ now due to our final code clean up that separated a (previously) single optimizer instance into two for depth and pose for more control for users. In that case, the release pretrained checkpoints would be larger.

Some questions:

  1. Are you able to load the pretrained models into the run scripts to reproduce the numbers reported on the page?
  2. Are you able to load your own trained models into the run scripts to produce some similar numbers as reported on the page?
  3. Are you able to load the pretrained model checkpoints to finetune in your own training session?

Hello Prof. Wong, thanks for your reply! Actually, there are two optimizer dictionaries in both depth and pose model. The part of pose model is as follow.

Pretrained_pose_models from Model Zoo: 2023-01-30 10-06-05屏幕截图

Train_pose_models training from scratch by myself: 2023-01-30 10-05-56屏幕截图

Answer to the questions:

  1. I can reproduce the results reported on the page by loading pretrained models. The numbers are as follow. 2023-01-30 10-15-51屏幕截图

  2. I can reproduce some similar numbers as reported on the page which are worse than the results on the page. 2023-01-30 10-17-10屏幕截图

  3. Yes, I am able to load the pretrained model checkpoints to finetune in my own training. But the problem is that the optimizer cannot be restore in the training while others can be restore. 2023-01-28 21-56-47屏幕截图

alexklwong commented 1 year ago

Thanks for pasting the results. For loading the pretrained weights on your end, it seems like the code is able to reproduce some similar numbers and similarly for training so that is a good sign.

For restoring the optimizer, I may need a few days to take a look at the optimizer parameters to see what is needed and what is not. After which I will re-upload the weights without the extra parameters.

GuoJeffrey commented 1 year ago

Thanks for pasting the results. For loading the pretrained weights on your end, it seems like the code is able to reproduce some similar numbers and similarly for training so that is a good sign.

For restoring the optimizer, I may need a few days to take a look at the optimizer parameters to see what is needed and what is not. After which I will re-upload the weights without the extra parameters.

Thanks for Prof. Wong's response! @alexklwong. I noticed from the related paper that the pretrained models from Model Zoo learned priors from another pretrained models. 2023-01-31 11-48-21屏幕截图

I wonder if it is convenient to load another pretrained models from Prof. Wong? I would appreciate it if possible. Kind regards.