I did not reproduce the scores in the paper, what is your environment when training？

MCG-NJU / MMN

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

MIT License

89 stars 8 forks source link

I did not reproduce the scores in the paper, what is your environment when training？ #5

Closed daidaiershidi closed 2 years ago

daidaiershidi commented 2 years ago

Thank you for proposing a very interesting work. On Charades, since the original number of GPUs is 4 and the original batchsize is 48, I set batchsize as 24 in two 3090 for keeping the same samples on each GPU. Other configurations remain the same. However, I get the score are

R@1,IoU@0..5 = 45.35 (47.31 in paper)
R@1,IoU@0..7 = 26.30 (27.28 in paper)
R@5,IoU@0..5 = 84.21 (83.74 in paper)
R@5,IoU@0..7 = 57.02 (58.41 in paper)

The excessive gap confuses me. So, what was your training environment, and if I don't have 4 GPUs, is there any way to get the score in the paper? Looking forward to your reply.

zhenzhiwang commented 2 years ago

Hi, thank you for your interest of our work. I think the number of iterations of your configuration is twice of it in our original configuration, so I believe the solution will be: 1) reduce the total epochs, as well as the number of epoch when freezing BERT, deleting contrastive loss, etc; or 2) accumulate gradients and update optimizer each two steps with a normalization term of losses (e.g., multiple 1/2). Note that Charades is the smallest dataset in this task, so a little performance fluctuation is common. I believe performance gap less than 0.5 will be a good reproduction. For further questions, please feel free to comment here.

daidaiershidi commented 2 years ago

I have adopted the second suggestion you provided ('accumulate gradients and update optimizer each two steps with a normalization term of losses (e.g., multiple 1/2)'). At the same time, gradient accumulation is often accompanied by improved learning rates. Because the number of rounds of gradient accumulation is 2, learning_rate = original_learning_rate * sqrt(2). Finally, I get similar results. Thank you for your help. It has taught me a lot.