Performance not matching the paper values

kshitij3112 commented 2 years ago

Congratulations on such a good work. I have tried to replicate the results as shown in the paper but after training for >50 epochs, I am still not getting the results similar to the paper. I have used the batch size of 3, and everything similar to the config file in the git repo. In another issue, you have mentioned to use around 50 epochs and sap as 20.

Following are my results: Validation per class PQ, SQ, RQ and IoU: car : 87.58% 91.26% 95.96% 93.75% bicycle : 27.15% 75.09% 36.16% 18.39% motorcycle : 54.03% 89.00% 60.71% 55.05% truck : 69.47% 90.71% 76.59% 85.95% bus : 48.70% 90.19% 54.00% 43.03% person : 69.89% 90.11% 77.55% 56.81% bicyclist : 87.22% 91.86% 94.95% 76.11% motorcyclist : 1.67% 68.28% 2.44% 0.97% road : 93.33% 93.35% 99.98% 93.31% parking : 25.02% 71.31% 35.08% 44.85% sidewalk : 75.14% 82.17% 91.45% 78.90% other-ground : 0.00% 0.00% 0.00% 0.37% building : 86.55% 90.47% 95.66% 88.56% fence : 14.98% 67.74% 22.11% 46.67% vegetation : 82.53% 85.78% 96.22% 84.80% trunk : 39.25% 71.35% 55.00% 55.20% terrain : 53.21% 73.15% 72.73% 68.09% pole : 53.14% 72.97% 72.82% 60.64% traffic-sign : 50.63% 75.39% 67.16% 39.80% Current val PQ is 53.657 while the best val PQ is 55.571 Current val miou is 57.435

Can you please guide on where the performance gap can be reduced. It is urgent for our current work and help will be really appreciated. Thank you.

kaxapatel commented 2 years ago

@kshitij3112 on which dataset did you train?

kshitij3112 commented 2 years ago

@kaxapatel I trained on the SemanticKITTI dataset

edwardzhou130 commented 2 years ago

Congratulations on such a good work. I have tried to replicate the results as shown in the paper but after training for >50 epochs, I am still not getting the results similar to the paper. I have used the batch size of 3, and everything similar to the config file in the git repo. In another issue, you have mentioned to use around 50 epochs and sap as 20.

Following are my results: Validation per class PQ, SQ, RQ and IoU: car : 87.58% 91.26% 95.96% 93.75% bicycle : 27.15% 75.09% 36.16% 18.39% motorcycle : 54.03% 89.00% 60.71% 55.05% truck : 69.47% 90.71% 76.59% 85.95% bus : 48.70% 90.19% 54.00% 43.03% person : 69.89% 90.11% 77.55% 56.81% bicyclist : 87.22% 91.86% 94.95% 76.11% motorcyclist : 1.67% 68.28% 2.44% 0.97% road : 93.33% 93.35% 99.98% 93.31% parking : 25.02% 71.31% 35.08% 44.85% sidewalk : 75.14% 82.17% 91.45% 78.90% other-ground : 0.00% 0.00% 0.00% 0.37% building : 86.55% 90.47% 95.66% 88.56% fence : 14.98% 67.74% 22.11% 46.67% vegetation : 82.53% 85.78% 96.22% 84.80% trunk : 39.25% 71.35% 55.00% 55.20% terrain : 53.21% 73.15% 72.73% 68.09% pole : 53.14% 72.97% 72.82% 60.64% traffic-sign : 50.63% 75.39% 67.16% 39.80% Current val PQ is 53.657 while the best val PQ is 55.571 Current val miou is 57.435

Can you please guide on where the performance gap can be reduced. It is urgent for our current work and help will be really appreciated. Thank you.

Yes, usually you only need to train it for around 50 epochs to get the best PQ. Did you change any parameter in the config except for the batch size?

kshitij3112 commented 2 years ago

Thanks for your reply. I just changed the SAP epoh to 20, val iter to 40000, and batch size to 3. Thats all the changes I have done.

edwardzhou130 commented 2 years ago

I haven't tried batch size 3 before. I don't know if it is the problem. And another thing you can try is disabling the occlusion check in the instance augmentation. The result in the paper was trained on the setting with this bug #3. It basically skips this occlusion check for all added instances and rotation augmentation.

kshitij3112 commented 2 years ago

Just to be precise, you mean I should set inst_os = False in this line [https://github.com/edwardzhou130/Panoptic-PolarNet/blob/main/configs/SemanticKITTI_model/Panoptic-PolarNet.yaml#L12] ? and should I train agaij from beginning with this setting or train just for few additional epochs?

edwardzhou130 commented 2 years ago

You don't need to change the inst_os. That bug is fixed in this commit:https://github.com/edwardzhou130/Panoptic-PolarNet/commit/3a72f2380a4e505e191b69da596f521a9d9f1a71. The easiest way is to change it back. Or you can set the min_dist to -1 for a similar effect. https://github.com/edwardzhou130/Panoptic-PolarNet/blob/3a72f2380a4e505e191b69da596f521a9d9f1a71/dataloader/instance_augmentation.py#L160 I would suggest retraining it with the same config in the repo If you have the time and machine to do so.

kshitij3112 commented 2 years ago

Thank you again for your quick replies. I will put the network on training and will post the results here once its completed. Have a nice day!

kshitij3112 commented 2 years ago

I just realized one thing, the training starts from the pre-trained model. Then after running the evaluation on the pre-trained model, it seems that it is already trained on the Panoptic segmentation task as the validation PQ values are almost reaching the final values. So for correct training, we should remove the pre-trained model. Is it correct?

edwardzhou130 commented 2 years ago

Yes, if there is another .pt file that has the same name, the training script will use it as the pretrained weight. You can change the model_save_path to a new path or change the previous .pt file name.

kshitij3112 commented 2 years ago

Hello! Thank you for your support until now. I achieved 58.4 PQ value on the SemanticKITTI dataset, which is close to the value in the paper. But for Nuscenes, I am getting 59 PQ on the validation set. I know there is a difference between the official dataset and the one in the paper, but still, I think the performance gap is high. I am using the exact parameters as SemanticKITTI for the network. Could you please suggest if there is anything else that can be done to improve the performance? Thanks

kaxapatel commented 2 years ago

hi. @kshitij3112 I am also trying to work with Nuscenes dataset but I am still new to point cloud can you please share your .py files for Nuscenes dataset for example dataset_nuscenes , train_nuscenes and instance_preprocess_nuscenes.py. I will be really grateful to you.

kaxapatel commented 2 years ago

@kshitij3112 I really need your help.

edwardzhou130 / Panoptic-PolarNet

Performance not matching the paper values #16