gligen / GLIGEN

Open-Set Grounded Text-to-Image Generation
MIT License
1.91k stars 145 forks source link

a difference between paper and code for training. #60

Open maluyazilation opened 8 months ago

maluyazilation commented 8 months ago

Thanks for your creative and valuable work. When i'm retraining your coco model, i find a difference between paper and code for training.

According to appendix-A-Training Detail, for the implementation of classifier-free guidance, caption and grounding tokens will be dropped with 10% probability. ("We randomly drop caption and grounding tokens with 10% probability for classifier-free guidance.") But in the released code, the probability of dropping caption is 50%. I'd like to know which one is better for the model. Thanks.

references:

  1. probability of grounding dropout = 0.1 ldm/modules/diffusionmodules/openaimodel.py: line 428-429

  2. probability of caption dropout = 0.5 tsv_dataset.py: line 306 configs/GoldG+SBU+CC3M+O365_box_text.yaml:line 65,72,79,86,93.