drprojects / DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Other
222 stars 24 forks source link

errors while optimize_parameters() from your pretrained models #17

Closed ruomingzhai closed 1 year ago

ruomingzhai commented 1 year ago

Hi, I downloaded your pretrained model for reproducing the whole training process. But it encounters this bug as follows: File "/root/DeepViewAgg/torch_points3d/models/base_model.py", line 259, in optimize_parameters self._grad_scale.step(self._optimizer) # update parameters AttributeError: 'NoneType' object has no attribute “step"

It seems in the optimize_parameters() function in torch_points3d/models/base_model.py the self._grad_scale is None. I thought it may derive from the checkpoint.pt file but apparently it doesn't. It only initiates in instantiate_optimizers() with the torch.cuda.amp.GradScaler class. I am not sure where went wrong. So I hope I can get some clues from you.

drprojects commented 1 year ago

Hi @ruomingzhai,

I think this might be due to the fact that torch-points3d does not like loading optimizers with differential learning rates. Indeed, to train the 3D+2D models, some blocks of the model have different learning rates (eg the 2D blocks vs the 3D blocks). When loading the optimizer, this might cause issues.

As of now, the project does not support loading pretrained optimizer and scheduler to fine-tune on. If you want to reproduce the training experiments, use scripts/train_kitti360.sh. If you want to infer using the pretrained weights, use notebooks/kitti360_inference.ipynb.

If you want to fine-tune a pretrained model, you will need to create a new optimizer and scheduler anyways, which should bypass the problem you encountered. For that, you can simply follow the procedure in scripts/train_kitti360.sh, with a few changes:

Hope that helps !

drprojects commented 1 year ago

Hi, assuming the last reply addressed the question, I am closing the issue.