gligen / GLIGEN

Open-Set Grounded Text-to-Image Generation
MIT License
1.91k stars 145 forks source link

Questions about details for reproducing, some of them are not noted in the paper. #71

Open maluyazilation opened 7 months ago

maluyazilation commented 7 months ago

The problem is about the fid-score on COCO2014CD. My fid is about 13, while the fid in paper is 5.8. The details for this part is not enough in paper. So it's really hard for reproducing.

So my questions can be concluded as following:

  1. Is the pretrained model downloaded from here? https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt

  2. Dropping probability for caption is 0.5 in github config file, and 0.1 in paper. Which one should i choose?

  3. As we all know, ldm resolution is 256. Is gligen finetuned on 256 or 512 ?

  4. While training, the warm-up is cosine or constant? Is 100,000 step enough for COCO?

  5. In UNet, only gate-attention is not freezed, right?

phillipinseoul commented 5 months ago

Are there any updates on this issue? I'm also curious which caption dropping rate to use for training (10% as in the paper or 50% as in the code.

Bailey-24 commented 4 months ago

why gligen is always output 512512, how to output 640480

maluyazilation commented 4 months ago

gligen基于LDM和Stable-Diffusion这两个大模型训练,这两个模型的输出分辨率固定为512和256。如果你想直接输出640*480的,必须去找一个能输出同样分辨率的图像生成大模型,这很困难。所以我建议你采用裁剪+放缩的方式进行后处理。

-----原始邮件----- 发件人:"Jiahui Zhu" @.> 发送时间:2024-02-27 19:54:44 (星期二) 收件人: gligen/GLIGEN @.> 抄送: xiaobo123 @.>, Author @.> 主题: Re: [gligen/GLIGEN] Questions about details for reproducing, some of them are not noted in the paper. (Issue #71)

why gligen is always output 512512, how to output 640480

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>