ashkamath / mdetr

Apache License 2.0
969 stars 125 forks source link

pretrain performance #53

Closed ShoufaChen closed 2 years ago

ShoufaChen commented 2 years ago

Hi,

Thanks for your great work.

When I tried to reproduce the pretrained performance, I found the results were mismatch with the paper, especially for Refcoco.

Any help would be much appreciated.

  GQA AP Flickr AP Flickr R@1 Refcoco AP Refcoco R@1 Refcoco+ R@1 Refcocog R@1
Res101 58.9 75.6 82.5 60.3 72.1 58.0 55.7
reprodude 58.6 75.7 82.9 56.5 70.2 55.3 54.2
ashkamath commented 2 years ago

Hi, Could you please explain your pre-training and fine-tuning steps (epochs trained, lr etc)? Also what is the R@1 that is reported in the third column?

Best, Aishwarya

ShoufaChen commented 2 years ago

@ashkamath Thanks for your reply. I have updated the table above.

Our pre-training steps totally follow your code. I.e., 40 epochs with 4x8 GPUs, and the same lr as you used.

We have not finished fine-tuning yet. The performance above is pre-training performance, compared with your reported results https://github.com/ashkamath/mdetr/blob/main/.github/pretrain.md#pre-training.

ashkamath commented 2 years ago

I'm not sure what could have caused the discrepancy, (maybe some library version/ random seed/ etc) but I think the fact that the Flickr and GQA AP seem to be very close to what we had is a good sign because these are the bigger datasets whereas results on the RefExp datasets can be very noisy since they are very small. Since there is a difference between the annotation density in RefExp vs the other datasets (only one box per text vs many boxes), finetuning is essential on those datasets so hopefully after you finish finetuning your numbers will be as good as the final version of ours. Also, even in our experiments you can see that our ENB3 numbers were worse than the R101 before fine-tuning but ended up being better as was expected, after finetuning so I'm guessing its just the small size of the dataset that causes all this but it's hard for me to guess what could be the reason for your numbers being different. Would it help if I provide the training logs for this pre-training run?

ShoufaChen commented 2 years ago

Thanks for your reply. I agree with you that it may be caused by the dataset size.

It would be much appreciated if you can provide the training log.

ashkamath commented 2 years ago

Here you go! resnet101_pretraining_log 2.txt

ShoufaChen commented 2 years ago

Thanks @ashkamath .