Closed yuqi1991 closed 4 years ago
Which pretrained model are you using, and which config file?
@VitorGuizilini-TRI Pretrained model: PackNet, Self-Supervised, 192x640, KITTI (K) Config file: train_kitti.yaml
Are you doing single GPU training, and with what batch size? The learning rates we specify are for 8 GPUs (as mentioned in the paper) and a batch size of 4, you might need to decrease them if you are doing single GPU training.
@VitorGuizilini-TRI I use 8 GPU as well and have been running for more than 10 hours, the avg loss is stagnating at around 0.09. I did also try with the tiny KITTI dataset at the first place, the loss decreases significantly when the lr is 5e-5 and I eventually get a decent result, but it doesn't happen on the raw KITTI dataset.
We regularly use that model for fine-tuning on KITTI and other datasets, and never observed that behavior. I will look deeper, but one last question: are you loading the checkpoint in the config file (checkpoint_path) or are you resuming from the checkpoint itself?
I tried both training from scratch and loading the model by manually extracting the state dicts from the pretrained file.
Another question about the semi-sup model, why did you guys disable the photometric loss ? I think it would be better to joint the photometric loss and the pointcloud loss together, since the point cloud is sparse on the re-projection plane.
About fine-tuning on pretrained models, I don't know what to say, I just ran an experiment here with the same code on this repository and it looks alright. The loss starts already stable at ~0.073, and the metrics are also stable at the numbers we report. Try decreasing the learning rate to 0.00005, that's the value at the end of the training session (0.0002/4). Are you running inside our docker?
About the semi-sup model, we are using the self-supervised photometric loss in addition to the supervised loss. You can set the ratio in the config file, if you set to 1.0 then it's purely supervised.
yes, I was running inside the docker, and have just replaced the wandb logger with tensorboard for visualization. I will continue try look into it. So 0.073 is the best loss you get on your side right ? Thanks for your help.
@yuqi1991 Hi, i'm newer to pytorch, can you shared the code for visualization with tensorboard?
@MingYang-buaa hi, since I already left the project, I can not provide you the source code, but maybe you can refer to the usage document here of tensorboard X, it's quite easy to use.
Hi there, Thanks so much for sharing the training code, I tried to run with self-sup model over KITTI dataset, but it seems to have a weird result after running for a few epochs with pretrained model.
The loss get smaller during training, but the evaluation metrics of error get higher every epoch. Did you have ever encounter this situation ?
@AdrienGaidon-TRI @VitorGuizilini-TRI @spillai