davidnvq / grit

GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
181 stars 28 forks source link

The COCO dataset #35

Open bai-24 opened 1 year ago

bai-24 commented 1 year ago

Dear Author, The number of training sets in the COCO dataset is 82783, but the number of training sets in the code is 566435, which will greatly increase training time. Why do we do this?

davidnvq commented 1 year ago

Thanks for asking. Where did you get the number of 566435 in the code? btw, every image in COCO has 5 captions. I guess 566435 / 5 = 113287 images which are (train + trainval) images.

bai-24 commented 1 year ago

Dear Author, ![Uploading image.png…]() The screenshot of the problem is shown in the above figure. The training data volume is 566435, but I don't know what it represents?

davidnvq commented 1 year ago

Thanks for reporting. May you upload the screenshot again? I can't see your screenshot (it takes a few seconds for a picture to be uploaded on Github).

image
bai-24 commented 1 year ago

picture

davidnvq commented 1 year ago

I still have no idea why it has 566435 iterations. Can you provide me more information about the config.yaml, how many GPUs you used, batch size, etc?

davidnvq commented 1 year ago

If you use batch_size = 1, then it may be correct as there are 566435 pairs of (image-caption).

bai-24 commented 1 year ago

The number of GPUs I used is 1, batch_ size is 4

davidnvq commented 1 year ago

Then may you check yourself the dataloader, or hard-code the batch_size = 4 in your dataloader. I believe that if batch_size = 4, you will have 566435/4 iterations. Something may be wrong here. If possible, may you send me your fork/code? I will check it tomorrow after I finish my work.

bai-24 commented 1 year ago

I seem to know the reason, the batch_size I set in the code is 1.Thank you very much for your help.

Wangdanchunbufuz commented 1 year ago

The number of GPUs I used is 1, batch_ size is 4

How long it takes according to your Settings?