Unable to Reproduce 12 Epoch Results in the Paper

yeliudev commented 1 year ago

Hi @Jingkang50, thanks for your great work. I ran a 12-epoch training for PSGFormer by modifying the config from 40/60 epochs to 8/12 epochs, and setting eval_pan_rels=False according to your suggestion. Below are the evaluation results at the 12th epoch, which are largely different from the 12th epoch results in Table 2 in your paper. May I know if there are any further modifications to reproduce the results? Thanks.

====================================================================================================
SGG eval:  R @ 20: 0.0941;  R @ 50: 0.1138;  R @ 100: 0.1159;  for mode=sgdet, type=Recall.
SGG eval:  R @ 20: 0.1778;  R @ 50: 0.2223;  R @ 100: 0.2278;  for mode=phrdet, type=Recall.
SGG eval:  R @ 20: nan;  R @ 50: nan;  R @ 100: nan;  for mode=sgdet, type=NoGraphConstraint @ 56 Recall.
SGG eval:  R @ 20: nan;  R @ 50: nan;  R @ 100: nan;  for mode=phrdet, type=NoGraphConstraint @ 56 Recall.
SGG eval:  mR @ 20: 0.0584;  mR @ 50: 0.0785;  mR @ 100: 0.0794;  for mode=sgdet, type=Mean Recall.
over    in front of beside  on  in  attached to hanging from    on back of  falling off going down  painted on  walking on  running on  crossing    standing on lying on    sitting on  flying over jumping over    jumping from    wearing holding carrying    looking at  guiding kissing eating  drinking    feeding biting  catching    picking playing with    chasing climbing    cleaning    playing touching    pushing pulling opening cooking talking to  throwing    slicing driving riding  parked on   driving on  about to hit    kicking swinging    entering    exiting enclosing   leaning on
0.3282  0.0558  0.0445  0.0521  0.0708  0.0394  0.1034  0.0000  0.0000  0.0000  0.0000  0.1905  0.1008  0.0000  0.1953  0.0399  0.0889  0.1840  0.0000  0.0000  0.2012  0.2540  0.0556  0.1662  0.0000  0.0000  0.1023  0.0000  0.0000  0.1667  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.2200  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.1628  0.3581  0.1288  0.2737  0.5926  0.2500  0.0000  0.0000  0.0000  0.0219  0.0000
+--------------+--------+--------------+--------+--------------+--------+
| predicate    | Rec100 | predicate    | Rec100 | predicate    | Rec100 |
+--------------+--------+--------------+--------+--------------+--------+
| over         | 0.3282 | in front of  | 0.0558 | beside       | 0.0445 |
| on           | 0.0521 | in           | 0.0708 | attached to  | 0.0394 |
| hanging from | 0.1034 | on back of   | 0.0000 | falling off  | 0.0000 |
| going down   | 0.0000 | painted on   | 0.0000 | walking on   | 0.1905 |
| running on   | 0.1008 | crossing     | 0.0000 | standing on  | 0.1953 |
| lying on     | 0.0399 | sitting on   | 0.0889 | flying over  | 0.1840 |
| jumping over | 0.0000 | jumping from | 0.0000 | wearing      | 0.2012 |
| holding      | 0.2540 | carrying     | 0.0556 | looking at   | 0.1662 |
| guiding      | 0.0000 | kissing      | 0.0000 | eating       | 0.1023 |
| drinking     | 0.0000 | feeding      | 0.0000 | biting       | 0.1667 |
| catching     | 0.0000 | picking      | 0.0000 | playing with | 0.0000 |
| chasing      | 0.0000 | climbing     | 0.0000 | cleaning     | 0.0000 |
| playing      | 0.2200 | touching     | 0.0000 | pushing      | 0.0000 |
| pulling      | 0.0000 | opening      | 0.0000 | cooking      | 0.0000 |
| talking to   | 0.0000 | throwing     | 0.0000 | slicing      | 0.0000 |
| driving      | 0.1628 | riding       | 0.3581 | parked on    | 0.1288 |
| driving on   | 0.2737 | about to hit | 0.5926 | kicking      | 0.2500 |
| swinging     | 0.0000 | entering     | 0.0000 | exiting      | 0.0000 |
| enclosing    | 0.0219 | leaning on   | 0.0000 | None         | None   |
+--------------+--------+--------------+--------+--------------+--------+
SGG eval:  mR @ 20: 0.1017;  mR @ 50: 0.1474;  mR @ 100: 0.1502;  for mode=phrdet, type=Mean Recall.
SGG eval:  mR @ 20: 0.0000;  mR @ 50: 0.0000;  mR @ 100: 0.0000;  for mode=sgdet, type=NoGraphConstraint @ 56 Mean Recall.
SGG eval:  mR @ 20: 0.0000;  mR @ 50: 0.0000;  mR @ 100: 0.0000;  for mode=phrdet, type=NoGraphConstraint @ 56 Mean Recall.
====================================================================================================

Jingkang50 commented 1 year ago

Thank you for your interest in our work. Maybe check out here https://github.com/Jingkang50/OpenPSG/issues/33 for log reference. Some potential issues could be different configs (num_gpu and lr), and the loading of pretrained models.

yeliudev commented 1 year ago

Thanks for your reply. I've changed the base lr from 1e-3 to 1e-4 (the paper said PSGTR uses 1e-4, but the config of PSGFormer uses 1e-3, not sure whether it is correct), and the results became R@20: 0.1550; R@50: 0.1829; R@100: 0.1845, which are still worse than the reported ones.

yeliudev commented 1 year ago

Comparing my training log and your uploaded ones, I notice that several losses and the HTriMatcher are modified/removed in your released config. Do these modifications affect the performances?

Left: The released config & Right: The uploaded training log

yeliudev commented 1 year ago

@Jingkang50 The 60 epochs schedule can not reproduce the results as well. Could you please share the codebase used for obtaining the checkpoints? It seems that the code used for obtaining checkpoints and the current release one are different. Thanks!

Jingkang50 commented 1 year ago

The released codebase is just the one that we get the final score (as a few participants from PSG challenges was able to get the corresponding results). Are you using the config here? It seems the loss weight aligned with the Left: The released config. For the difference in the config, we delete/change the lines either because some losses are unused or they don't matter much.

Jingkang50 / OpenPSG

Unable to Reproduce 12 Epoch Results in the Paper #100