Closed DFLyan closed 3 years ago
Not sure, but I think it tries to find determine which model to save by looking at the metric.
In my case, setting this config https://github.com/TRI-ML/packnet-sfm/blob/c03e4bf929f202ff67819340135c53778d36047f/configs/default_config.py#L23
to save_top_k = -1
can bypass this problem as it will save all the models.
Yes, you are trying to save based on a metric that doesn't exist, so setting save_top_k = -1
will go around this issue.
Hello, when I train the network, the model can not be saved. The error is shown as And when I want to find the reason why this error happens, I print the output of some variable in the file(model_checkpoint.py), then the outputs are Why the loss of val is 0? And does this cause the error? I have no idea to solve this problem, and I want to get help from you, thanks.