I cannot reproduce the performance that you suggested #3

IamJiyong commented 1 year ago

Thank you for your great work!

I just ran your implementation, but I couldn't reproduce the performance of the model that you suggested on your paper.

My environment is as follows :

GPU : RTX 4090 CUDA : 11.8 Pytorch : 1.13.0 (I'm using NVIDIA container toolkit)

I followed the commands you wrote on, but I set batch_size as 4 due to the graphic card memory.

The performance I got is as follows :

(IoU 0.7, 0.5, 0.5 / recall 40 / hard, moderate, easy)

Pretrained PV-RCNN Car : 87.9537, 76.3818, 71.5516 Ped. : 41.6893, 38.2420, 34.3539 Cyc. : 65.4243, 43.2410, 40.8903 HSSDA (epoch.80) Car : 87.4405, 75.7870, 71.0323 Ped. : 45.2401, 41.0364, 36.4965 Cyc. : 67.2083, 44.5285, 40.9167

I've got similar issue when I attempt to run 3DIoUMatch.

So I'm assuming that my environment is the cause.

But nothing matters when I train the model with fully supervised method. (without any semi-supervised pipeline)

Can you help me with this?

IamJiyong commented 1 year ago

I'm sorry, but I don't speak chinese and I don't know what QQ is.

How long does this experiment need to run for each epoch?

azhuantou commented 1 year ago

Thanks for your interest. Based on the information currently given, I have no idead does this experimental data appear. Could you provide the training log file? @IamJiyong

As for the experiment time, the whole training process (80 epoches) of PV-RCNN need about 4 hours with 6xA100 GPUs (batchsize 60). @dof-pikes

IamJiyong commented 1 year ago

I think I ran the codes in a wrong way. The total training time was about 2 hours, but it costs 23 hours after I reconstructed the environments and ran it again. I'll let you know if the results are still abnormal after this training.

IamJiyong commented 1 year ago

2023-05-02 03:09:48,117 INFO **Start evaluation kitti_models/pv_rcnn_ssl(pv_rcnn_002_1)** 2023-05-02 03:55:50,003 INFO * Performance of EPOCH 80 *** 2023-05-02 03:55:50,003 INFO Generate label finished(sec_per_example: 0.0554 second). 2023-05-02 03:55:50,003 INFO recall_roi_0.3: 0.936189 2023-05-02 03:55:50,003 INFO recall_rcnn_0.3: 0.935096 2023-05-02 03:55:50,003 INFO recall_roi_0.5: 0.888112 2023-05-02 03:55:50,003 INFO recall_rcnn_0.5: 0.890734 2023-05-02 03:55:50,003 INFO recall_roi_0.7: 0.638330 2023-05-02 03:55:50,003 INFO recall_rcnn_0.7: 0.678103 2023-05-02 03:56:24,082 INFO Car AP@0.70, 0.70, 0.70: bbox AP:90.6938, 89.1181, 88.3390 bev AP:90.1095, 87.2880, 85.4988 3d AP:89.0338, 78.4787, 77.1978 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:1.9447, 2.0404, 2.0384 bev AP:1.9528, 1.7286, 1.6709 3d AP:1.7251, 1.6078, 1.5052 Cyclist AP@0.50, 0.50, 0.50: bbox AP:87.3605, 67.0018, 65.3337 bev AP:83.9256, 62.7011, 56.2899 3d AP:82.8759, 61.4383, 55.8087

2023-05-02 03:56:24,086 INFO Result is save to /workspace/HSSDA/output/kitti_models/pv_rcnn_ssl/pv_rcnn_002_1/eval/eval_with_train/epoch_80/val 2023-05-02 03:56:24,086 INFO ****Evaluation done.***** 2023-05-02 03:56:24,118 INFO Epoch 80 has been evaluated 2023-05-02 03:56:54,152 INFO **End evaluation kitti_models/pv_rcnn_ssl(pv_rcnn_002_1)**

mAP of pedestrian is strangely too low. I can't figure out why.

azhuantou commented 1 year ago

Sorry for the late reply. It seems that the performance of car and cyclists is normal, but the low performance for pedestrians should be due to a high number of incorrect pseudo-labels. I'm not sure why this is happening, but it could be due to the batch size. You can try increasing the INTERVAL setting in cfgs/kitti_models/pv_rcnn_ssl.yaml to a larger value, such as 10 or 20, and see how the experiment results turn out.

IamJiyong commented 1 year ago

Okay, I'll try it. Thank you so much and I'll tell you when the result comes up.

IamJiyong commented 1 year ago

I set the interval to 10 as you suggested and then turned the training around, but it didn't work. I visualized it and found that there were too many false positives for pedestrian. I think I'm having a problem with Pseudo label mining, and I'm thinking of excluding pedestrian in gt sampling to see if this is correct. What do you think about it? And here are my train log and the visualized scene.

2023-05-07 00:02:11,134 INFO **Start evaluation kitti_models/pv_rcnn_ssl(pv_rcnn_002_1_v4)** 2023-05-07 00:46:02,836 INFO * Performance of EPOCH 80 *** 2023-05-07 00:46:02,836 INFO recall_roi_0.3: 0.934391 2023-05-07 00:46:02,836 INFO recall_rcnn_0.3: 0.934030 2023-05-07 00:46:02,836 INFO recall_roi_0.5: 0.890050 2023-05-07 00:46:02,836 INFO recall_rcnn_0.5: 0.893655 2023-05-07 00:46:02,836 INFO recall_roi_0.7: 0.647441 2023-05-07 00:46:02,836 INFO recall_rcnn_0.7: 0.685652 2023-05-07 00:46:35,403 INFO Car AP@0.70, 0.70, 0.70: bbox AP:90.6737, 89.2257, 88.5346 bev AP:90.1903, 87.4166, 86.0930 3d AP:89.0955, 78.5786, 77.2564 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:13.7083, 11.9690, 11.5980 bev AP:14.4148, 12.3542, 11.7175 3d AP:14.3175, 12.1915, 11.3892 Cyclist AP@0.50, 0.50, 0.50: bbox AP:88.0522, 72.8587, 66.6439 bev AP:86.5022, 68.6666, 63.0746 3d AP:84.1040, 62.8968, 60.8126

2023-05-07 00:46:35,406 INFO Result is save to /workspace/HSSDA/output/kitti_models/pv_rcnn_ssl/pv_rcnn_002_1_v4/eval/eval_with_train/epoch_80/val 2023-05-07 00:46:35,406 INFO ****Evaluation done.***** 2023-05-07 00:46:35,431 INFO Epoch 80 has been evaluated 2023-05-07 00:47:05,463 INFO **End evaluation kitti_models/pv_rcnn_ssl(pv_rcnn_002_1_v4)**


azhuantou commented 1 year ago

Sorry for the late reply. I find some people also have the problem of low pedestrian performance. I will recheck the released code when I'm free. This is my training log, I hope it can help you. log_train_20230327-202350.txt

dof-pikes commented 1 year ago

I also met the same low Ped performance problem when I reproduce 2% and 20% setting experiments.

IamJiyong commented 1 year ago

I also met the same low Ped performance problem when I reproduce 2% and 20% setting experiments.

What is the batch size you set to?

dof-pikes commented 1 year ago

I also met the same low Ped performance problem when I reproduce 2% and 20% setting experiments.

What is the batch size you set to?

I also met the same low Ped performance problem when I reproduce 2% and 20% setting experiments.

What is the batch size you set to?

batch size == 10

azhuantou commented 1 year ago

Sorry for the late reply. @dof-pikes @IamJiyong I have fixed some bugs and also validated the performance on pedestrians with low batch size. You can update the code and give it another try.

dof-pikes commented 1 year ago

Thanks for your work. Can you release the difference before and after the code update? @azhuantou

IamJiyong commented 1 year ago

I think the problem has solved! Thank you so much! @azhuantou

azhuantou commented 1 year ago

I would recommend you use some folder comparison tool such as Meld to compare the difference. @dof-pikes I will closed the issue. Feel free to reopen it if you have any question.