Open Naidala opened 1 year ago
I got a reply: The released checkpoint are only trained on GoldG+SBU+O365+CC3M, with no fine-tuning on COCO
Hi, have you trained the model yourself? I am trying to reproduce the text-box grounding generation on COCO2014, but even if I trained for 200k iters (the paper recommends 100K), the results do not follow the bbox. I do not know where it goes wrong, is it because the COCO2014 dataset is still not big enough?
Hello, the paper mentions several models, one trained on COCO, another on LVIS, a third on GoldG, O365, SBU and CC3M. As far as I understand, without retraining the model, you can download one of the ten checkpoints to use with the gligen_inference.py script.
My question is: on which dataset were these checkpoints trained? In particular, for "Box+Text+Image" modality with Generation and Inpainting mode.