Closed tonytu16 closed 4 years ago
Can you share your .yaml file, or at least the checkpoint part? This error indicates that you are trying to monitor a metric that does not exist.
Hello,
Thank you for your reply! I pulled the repo and ran a simple training test on the kitti_tiny dataset. I put the kitti_tiny folder in packnet-sfm/data/datasets and added a cfg.checkpoint.filepath in default_config.py for checkpoint saving. Those two are the only changes I made. Thank you!
This saving issue happened for me too. I believe there might be some reason behind it. One the one is maybe your graphic memory is full. I'm not sure but try to reboot the system and start just the training process and see if its saving checkpoints after several epochs.
https://github.com/TRI-ML/packnet-sfm/issues/54
check this one too. You did not define the directory for your checkpoints. go with this instruction in above link.
You are monitoring "loss", that is not a valid metric. Try something like "abs_rel_pp_gt", it should work.
Hello, When I tried to save a model to the designated path, I get an "checkpoint metric is not available error". So I repulled the repo and tried training on KITTI_tiny dataset; the model seems to train properly and I don't get the "checkpoint metric is not available" error, but I don't see the checkpoint file being saved to the path I designated in line 22 in default_config.py. Could you help me with this? Thank you very much!