gligen / GLIGEN

Open-Set Grounded Text-to-Image Generation
MIT License
2.02k stars 150 forks source link

Training Dataset of HF Hub Checkpoint vs. Paper Models #56

Open Naidala opened 1 year ago

Naidala commented 1 year ago

Hello, the paper mentions several models, one trained on COCO, another on LVIS, a third on GoldG, O365, SBU and CC3M. As far as I understand, without retraining the model, you can download one of the ten checkpoints to use with the gligen_inference.py script.

My question is: on which dataset were these checkpoints trained? In particular, for "Box+Text+Image" modality with Generation and Inpainting mode.

Naidala commented 1 year ago

I got a reply: The released checkpoint are only trained on GoldG+SBU+O365+CC3M, with no fine-tuning on COCO

cats-food commented 1 year ago

Hi, have you trained the model yourself? I am trying to reproduce the text-box grounding generation on COCO2014, but even if I trained for 200k iters (the paper recommends 100K), the results do not follow the bbox. I do not know where it goes wrong, is it because the COCO2014 dataset is still not big enough?