dwofk / fast-depth

ICRA 2019 "FastDepth: Fast Monocular Depth Estimation on Embedded Systems"
MIT License
930 stars 188 forks source link

Size mismatch when using pruned model #29

Closed Karthikaddagadderamesh closed 4 years ago

Karthikaddagadderamesh commented 4 years ago

Hello, I am new to pytorch and am using the fast depth for my research. I planned to use the pruned network. But on loading the model i get an error "current model and checkpoint model for mismatcin size". This mismastch is due to difference the the dimension of the conv net of the pruned and model.py. Please let me know how I can solve this issue.

Karthikaddagadderamesh commented 4 years ago

I changed the dimensons in provided model.py and mobilenet.py which made it work. I would like to know if this is the right solution for the problem?

dwofk commented 4 years ago

Hi @Karthikaddagadderamesh

Are you just trying to run the loaded model? Could you please share a code snippet to clarify how you're loading the model?

In models.py and mobilenet.py, the dimensions are only fixed in the __init__(...) functions used to create new model definitions. When running a previously defined model, the forward(...) function should be called (to run a forward pass through the model). This function does not assume any particular dimensions of layers and shouldn't be causing mismatch errors.

If I understand your solution correctly, you've changed the __init__(...) functions so that the dimensions there match those in the pruned model. If you're simply trying to run the loaded model, there shouldn't be a need to do this change.

Karthikaddagadderamesh commented 4 years ago

Hello thank you for the quick reply. For my research we plan to train the fast depth from scratch on custom dataset using pre trained network.

We have written a initialisation script to match the output size of 992 into 992 following the the code snippet:

checkpoint = torch.load(model_path, map_location=device)
model = models.MobileNetSkipAdd(output_size=992, pretrained=False)
model.load_state_dict(checkpoint["model"].state_dict())
epoch = 0
epochLossesList = []
model.to(device)
model.train()
....

and saved the model after the intialisation and model path takes the pruned network and saves the updated model in the same.

  1. Then in training script we have loaded the model for training:
    
    model = models.MobileNetSkipAdd(output_size=992, pretrained=False)

checkpoint = torch.load(model_path, map_location=device) epoch = checkpoint["epoch"] epochLossesList = checkpoint['epochLosses']

model.load_state_dict(checkpoint['model_state_dict']) model.to(device) model.train()


This worked well with the un-pruned netwrok when implementing the same for prunned network the size mismatch occured. Yes I updated the ````__init__(...)``` to correct the size mis match.
dwofk commented 4 years ago

@Karthikaddagadderamesh

Thanks for clarifying. Yes, if you are training the pruned network from scratch, modifying the model definition as you have done is a suitable workaround to resolve the mismatch.

Karthikaddagadderamesh commented 4 years ago

Thank you very much for the reply. I would like to ask one last question regarding the use of pytorch version. Does using pytorch version 1.6 over 0.4.1 have problems for training.

dwofk commented 4 years ago

Sorry, we have not tried training with PyTorch v1+ and thus can't confirm whether there may be differences/problems.

Karthikaddagadderamesh commented 4 years ago

Thank you