Results submission - Githubissues

RickOnEarth commented 2 years ago

Hi @Vegeta2020, thank you for your advanced work! I used your code to train SE-SSD network for 60 epochs from scratch and I reproduced the results on validation data set successfully: Evaluation official_AP_40: car AP(Average Precision)@0.70, 0.70, 0.70: bbox AP:99.58, 95.62, 93.22 bev AP:96.76, 91.98, 89.67 3d AP:93.94, 86.15, 83.58 aos AP:99.56, 95.42, 92.82 ` but when I submitted the results on test data set to KITTI benchmark, the submission results are pretty bad:

Car (Detection) | 96.79 % | 93.40 % | 90.42 % Car (Orientation) | 96.72 % | 93.07 % | 89.93 % Car (3D Detection) | 88.46 % | 79.46 % | 74.44 % Car (Bird's Eye View) | 93.01 % | 89.18 % | 84.18 % I know there is always a gap between evaluation results on val and test data set, but these results seem too bad. I have submitted the results in the format as follows for each *.txt file: Car 0.0000 0 1.3052 30.5894 172.1811 233.7252 291.5463 1.6162 1.7451 4.2116 -7.9244 1.6071 11.8189 0.7247 0.4037 Car 0.0000 0 0.5167 150.3878 180.0324 350.4213 245.1339 1.4585 1.6154 4.0298 -8.5218 1.6400 17.3082 0.0649 0.3411 Car 0.0000 0 1.2624 0.0000 182.4527 78.7715 254.9635 1.4427 1.5532 3.5953 -13.7700 1.6827 16.4132 0.5723 0.1571

Dow you know the potential reason why the submission results are so bad? Or do you have any advice?

YiF-chen commented 2 years ago

Hi @Vegeta2020, thank you for your advanced work! I used your code to train SE-SSD network for 60 epochs from scratch and I reproduced the results on validation data set successfully: Evaluation official_AP_40: car AP(Average Precision)@0.70, 0.70, 0.70: bbox AP:99.58, 95.62, 93.22 bev AP:96.76, 91.98, 89.67 3d AP:93.94, 86.15, 83.58 aos AP:99.56, 95.42, 92.82 ` but when I submitted the results on test data set to KITTI benchmark, the submission results are pretty bad:

Car (Detection) | 96.79 % | 93.40 % | 90.42 % Car (Orientation) | 96.72 % | 93.07 % | 89.93 % Car (3D Detection) | 88.46 % | 79.46 % | 74.44 % Car (Bird's Eye View) | 93.01 % | 89.18 % | 84.18 % I know there is always a gap between evaluation results on val and test data set, but these results seem too bad. I have submitted the results in the format as follows for each *.txt file: Car 0.0000 0 1.3052 30.5894 172.1811 233.7252 291.5463 1.6162 1.7451 4.2116 -7.9244 1.6071 11.8189 0.7247 0.4037 Car 0.0000 0 0.5167 150.3878 180.0324 350.4213 245.1339 1.4585 1.6154 4.0298 -8.5218 1.6400 17.3082 0.0649 0.3411 Car 0.0000 0 1.2624 0.0000 182.4527 78.7715 254.9635 1.4427 1.5532 3.5953 -13.7700 1.6827 16.4132 0.5723 0.1571

Dow you know the potential reason why the submission results are so bad? Or do you have any advice?

Hi, the results on validation data is only based on AP_40, i wonder if the AP_11 also 86.xx? My result test on valid-dataset is 79.xx based on AP_11.

Vegeta2020 commented 2 years ago

Hi @RickOnEarth , the model seems degrades or overfits as its performance is lower than the baseline, I'm not sure if the problem lies in the pre-trained model or in the training, which are the possible issues I can think of now. If it's in training, you may submit the early epoches or reduce the traing epoches to avoid overfitting. Otherwise, you may take more care about the pre-trained model, e.g., train or finetune the SSD with ODIoU loss or SADA.

RickOnEarth commented 2 years ago

Hi @Vegeta2020, thank you for your advanced work! I used your code to train SE-SSD network for 60 epochs from scratch and I reproduced the results on validation data set successfully: Evaluation official_AP_40: car AP(Average Precision)@0.70, 0.70, 0.70: bbox AP:99.58, 95.62, 93.22 bev AP:96.76, 91.98, 89.67 3d AP:93.94, 86.15, 83.58 aos AP:99.56, 95.42, 92.82 but when I submitted the results on test data set to KITTI benchmark, the submission results are pretty bad: Car (Detection) | 96.79 % | 93.40 % | 90.42 % Car (Orientation) | 96.72 % | 93.07 % | 89.93 % Car (3D Detection) | 88.46 % | 79.46 % | 74.44 % Car (Bird's Eye View) | 93.01 % | 89.18 % | 84.18 % I know there is always a gap between evaluation results on val and test data set, but these results seem too bad. I have submitted the results in the format as follows for each *.txt file:Car 0.0000 0 1.3052 30.5894 172.1811 233.7252 291.5463 1.6162 1.7451 4.2116 -7.9244 1.6071 11.8189 0.7247 0.4037 Car 0.0000 0 0.5167 150.3878 180.0324 350.4213 245.1339 1.4585 1.6154 4.0298 -8.5218 1.6400 17.3082 0.0649 0.3411 Car 0.0000 0 1.2624 0.0000 182.4527 78.7715 254.9635 1.4427 1.5532 3.5953 -13.7700 1.6827 16.4132 0.5723 0.1571 ` Dow you know the potential reason why the submission results are so bad? Or do you have any advice?

Hi, the results on validation data is only based on AP_40, i wonder if the AP_11 also 86.xx? My result test on valid-dataset is 79.xx based on AP_11.

Hi, in my case the result test on valid-dataset using AP_11 is also much lower than the result using AP_40

RickOnEarth commented 2 years ago

Hi @RickOnEarth , the model seems degrades or overfits as its performance is lower than the baseline, I'm not sure if the problem lies in the pre-trained model or in the training, which are the possible issues I can think of now. If it's in training, you may submit the early epoches or reduce the traing epoches to avoid overfitting. Otherwise, you may take more care about the pre-trained model, e.g., train or finetune the SSD with ODIoU loss or SADA.

Thank you for your reply. I didn't have a experience on submitting results to KITTI benchmark. Should I train the model with the seperated 3712 training samples or use the whole trainval dataset with 7481 samples? Is it allowed to use the whole trainval dataset? And the test result would be better or not?

Vegeta2020 commented 2 years ago

The test results should be evaluated with the model trained on trainval set; more data is better.

RickOnEarth commented 2 years ago

The test results should be evaluated with the model trained on trainval set; more data is better.

Thank you very much for your reply!

Eaphan commented 2 years ago

The test results should be evaluated with the model trained on trainval set; more data is better.

Thank you very much for your reply!

@RickOnEarth Do you reproduce the performance on the test set with all trainval data?

Vegeta2020 / SE-SSD

Results submission #69