Closed zsteve2529 closed 3 years ago
You need to set the checkpoint path in the .yaml file (there are some other options as well), otherwise the checkpoint won't be saved, unfortunately. Let me know if you have any other questions, so we can make sure your next model is properly saved!
@VitorGuizilini-TRI - Thank you very much for the prompt answer.
Perhaps, my confusion is that I was under the assumption if I train it will produce a new brand new checkpoint file?
Are the users of packnet-sfm supposed to download the pre-trained models (or checkpoint files) and improve upon on them? I was under the assumption that I could reproduce my own version of a brand new checkpoint file after I train?
Here is my current YAML file. Please advise. and thank you so much. I do not have lots of GPUs, so my current training takes long time. I need to do it right this time. What I'd like to do is, I want to save the training results.
`
checkpoint:
filepath: '../../data/'
model: name: 'SelfSupModel' optimizer: name: 'Adam' depth: lr: 0.0002 pose: lr: 0.0002 scheduler: name: 'StepLR' step_size: 30 gamma: 0.5 depth_net: name: 'PackNet01' version: '1A' pose_net: name: 'PoseNet' version: '' params: crop: 'garg' min_depth: 0.0 max_depth: 80.0 datasets: augmentation: image_shape: (192, 640) train: batch_size: 4 dataset: ['KITTI'] path: ['../../data/datasets/KITTI_raw'] split: ['data_splits/eigen_zhou_files.txt'] depth_type: ['velodyne'] repeat: [2] validation: dataset: ['KITTI'] path: ['../../data/datasets/KITTI_raw'] split: ['data_splits/eigen_val_files.txt', 'data_splits/eigen_test_files.txt'] depth_type: ['velodyne'] test: dataset: ['KITTI'] path: ['../../data/datasets/KITTI_raw'] split: ['data_splits/eigen_test_files.txt'] depth_type: ['velodyne']
`
In the above, the format of YAML file is messed up for some reason.
You can definitely train new models from scratch, when I say checkpoint path I mean the path where the checkpoint will be saved. I agree that the names might be a little confusing, I will look into making this more clear in the future.
You seem to be doing it right, your checkpoint.filepath
is set, so that's there new models will be saved. One thing you can try is set an absolute path, instead of relative. There are some other options you can use:
checkpoint:
filepath: '/data/experiments' # Where the models will be saved
monitor: 'abs_rel_pp_gt' # which metric is observed
monitor_index: 0 # from which validation dataset the metric is observed
mode: 'min' # if the metric is minimized or maximized
A good practice is to use KITTI_tiny first and run for one epoch, only to see if a model is saved. If it's working properly then you can start a full training session. I hope this works for you!
Along with 'abs_rel_pp_gt', what other metrics can be used? What are all the possible strings we can give to checkpoint.monitor? Thanks
Hi All and Authors of this great work,
I have spent 10 days training the packet-sfm on a KITTI dataset using YAML file (not the checkpoint), and the training was successful. However, I can not figure out where the train.py script saves the trained models?
Yes, I know it is a simple question and I should read the code, but I did (also I am not super familiar with Python). So, could someone please point out to me where is the training data saved?