Can't Reproduce Results

demmlert commented 3 years ago

Hello,

I am currently trying to reproduce your results. But I can't achieve the results from your paper. Especially the results for the large and extra large clamp are not even close. I trained with your original settings and I also trained with many different settings. You can see the results of your original code below. Did you do something different than the code in the GitHub repository? How did you achieve these results? I would greatly appreciate any insights you can provide.

Cheers, Tobias

Loading valtestset.
Finish loading valtestset.
test_dataset_size:  2949
loading pretrained mdl.
==> Loading from checkpoint 'train_log/ycb/checkpoints/pvn3d_best.pth.tar'
==> Done
{'bn_decay': 0.5,
 'bn_momentum': 0.9,
 'cal_metrics': False,
 'checkpoint': 'train_log/ycb/checkpoints/pvn3d_best.pth.tar',
 'decay_step': 200000.0,
 'epochs': 1000,
 'eval_net': True,
 'lr': 0.01,
 'lr_decay': 0.5,
 'run_name': 'sem_seg_run_1',
 'test': True,
 'weight_decay': 0}
002_master_chef_can
***************add:  81.501960977074
***************adds:     96.2391433054099
***************add(-s):  81.501960977074
003_cracker_box
***************add:  95.02517170604428
***************adds:     96.33354682317912
***************add(-s):  95.02517170604428
004_sugar_box
***************add:  96.70224997174138
***************adds:     97.62350291952767
***************add(-s):  96.70224997174138
005_tomato_soup_can
***************add:  90.57139644393725
***************adds:     95.6836373267824
***************add(-s):  90.57139644393725
006_mustard_bottle
***************add:  97.21889029490261
***************adds:     97.96199977515869
***************add(-s):  97.21889029490261
007_tuna_fish_can
***************add:  88.56426988798162
***************adds:     96.05921747618807
***************add(-s):  88.56426988798162
008_pudding_box
***************add:  96.31857018553097
***************adds:     97.39874824429833
***************add(-s):  96.31857018553097
009_gelatin_box
***************add:  97.08867204377485
***************adds:     98.10859806330711
***************add(-s):  97.08867204377485
010_potted_meat_can
***************add:  86.12085579084162
***************adds:     93.38140702656538
***************add(-s):  86.12085579084162
011_banana
***************add:  92.83769736242964
***************adds:     96.72287335503506
***************add(-s):  92.83769736242964
019_pitcher_base
***************add:  95.82361826571781
***************adds:     97.0729541720329
***************add(-s):  95.82361826571781
021_bleach_cleanser
***************add:  93.01889813787703
***************adds:     96.39534615777723
***************add(-s):  93.01889813787703
024_bowl
***************add:  39.70736335824888
***************adds:     91.43677500137144
***************add(-s):  91.43677500137144
025_mug
***************add:  93.49651109407714
***************adds:     97.01624146020255
***************add(-s):  93.49651109407714
035_power_drill
***************add:  95.49805970932833
***************adds:     96.90840062006336
***************add(-s):  95.49805970932833
036_wood_block
***************add:  45.7634938092764
***************adds:     90.57028351904268
***************add(-s):  90.57028351904268
037_scissors
***************add:  92.97522703157969
***************adds:     96.6938637699124
***************add(-s):  92.97522703157969
040_large_marker
***************add:  89.46817811747023
***************adds:     95.9155988271178
***************add(-s):  89.46817811747023
051_large_clamp
***************add:  49.39259127762066
***************adds:     72.13551509864575
***************add(-s):  72.13551509864575
052_extra_large_clamp
***************add:  28.119535869702954
***************adds:     49.14478483876732
***************add(-s):  49.14478483876732
061_foam_brick
***************add:  67.57602168812072
***************adds:     95.96293464676455
***************add(-s):  95.96293464676455
Average of all object:
***************add:  81.56139204872753
***************adds:     92.60787487748333
***************add(-s):  89.5943104821381
All object (following PoseCNN):
***************add:  83.69727623477716
***************adds:     92.58426460144487
***************add(-s):  88.74808878246616
=== Training Progress ===
acc_rgbd      --- val: 0.9648   
loss          --- val: 0.5299   
loss_rgbd_seg --- val: 0.0900   
loss_kp_of    --- val: 0.3288   
loss_ctr_of   --- val: 0.0211   
loss_target   --- val: 0.5299

aiai84 commented 3 years ago

I have the same problem with you. And my result is not better as yours, what is the size of you pretrained model of the YCB dataset, thank you!

demmlert commented 3 years ago

The size of the trained models is 449MB

aiai84 commented 3 years ago

Have you found the reason, my results show that it seems overfit.

demmlert commented 3 years ago

I have achieved some better results (not as good as the paper claims). But that were just some lucky exceptions. My best guess is that there is a fundamental problem with the loss function on symmetric objects. Or even with the keypoint approach since keypoints are not well defined on symmetric objects. In the end a low loss does not strictly imply a good ADD-S score. Therefore you need to save a bunch of non optimal checkpoints and test them all. If you get lucky you can match the results...

ethnhe commented 3 years ago

Oh, the result is too strange, since most of the classes achieve comparable or even better results as in our paper but the two clamp classes are too low to be normal. There must be something wrong. The result in the paper is from the model trained by our internal architecture. But I've checked that code in this repo can achieve comparable results before I released it. Did you inference our released model and get normal results? I suspect that maybe there is something wrong with the data or some setting of these two classes, you can visualize data of these two classes by python3 -m datasets.ycb.ycb_dataset

what is the size of you pretrained model of the YCB dataset, thank you!

The pretrained model we released on one drive only saves the model_state without the optim_state, so it's much smaller. Related code to modify is at here

My best guess is that there is a fundamental problem with the loss function on symmetric objects. Or even with the keypoint approach since keypoints are not well defined on symmetric objects. In the end, a low loss does not strictly imply a good ADD-S score. Therefore you need to save a bunch of nonoptimal checkpoints and test them all. If you get lucky you can match the results```

Improving the design of loss function on symmetric objects can get even better results but our current setting should get good results as in our paper and is a lot better than the direct regression approaches, as is shown by our ablation study. For the checkpoint to test, the saved the best checkpoint selected from the validation usually gets good results.

Have you found the reason, my results show that it seems overfit.

For the overfitting concerned, We've also tested our pre-trained model on YCB-Video to a new camera without finetuning and compared DenseFusion in our robotic grasping demo, which still shows quite better results. One example is as follows:

LEONHWH commented 2 years ago

@aiai84 I also found it overfit . Do you have experience with tuning the hyperparameters?

ethnhe / PVN3D

Can't Reproduce Results #82