TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

checkpoint can not be saved #139

Closed DFLyan closed 3 years ago

DFLyan commented 3 years ago

Hello, when I train the network, the model can not be saved. The error is shown as image And when I want to find the reason why this error happens, I print the output of some variable in the file(model_checkpoint.py), image then the outputs are image Why the loss of val is 0? And does this cause the error? I have no idea to solve this problem, and I want to get help from you, thanks.

surfii3z commented 3 years ago

Not sure, but I think it tries to find determine which model to save by looking at the metric.

In my case, setting this config https://github.com/TRI-ML/packnet-sfm/blob/c03e4bf929f202ff67819340135c53778d36047f/configs/default_config.py#L23

to save_top_k = -1 can bypass this problem as it will save all the models.

VitorGuizilini-TRI commented 3 years ago

Yes, you are trying to save based on a metric that doesn't exist, so setting save_top_k = -1 will go around this issue.