gligen / GLIGEN

Open-Set Grounded Text-to-Image Generation
MIT License
1.98k stars 148 forks source link

GLIGEN for SD v2 or XL #46

Open TonyLianLong opened 1 year ago

TonyLianLong commented 1 year ago

The released weights are based on SD v1.4. Are there any plans for v2 and XL support?

zengjie617789 commented 11 months ago

you should train it by yourself and what sad is it shoulbe try every time if you want to change your base model which cost much time.

zjysteven commented 8 months ago

Hi @TonyLianLong, would you be interested in collaborating on extending GLIGEN to v2 and XL? I tried some training on SDXL but unfortunately haven't had luck in getting decent results.

Acephalia commented 7 months ago

Hi @TonyLianLong, would you be interested in collaborating on extending GLIGEN to v2 and XL? I tried some training on SDXL but unfortunately haven't had luck in getting decent results.

@zjysteven I had a look into this yesterday myself and quite keen to see if we can make any progress.

zjysteven commented 7 months ago

@Acephalia I've made some attempts earlier but didn't have any luck. I've also seen people struggling to reproduce the training even with the original code on SD1.5, so finally I decided to stop trying. There are some other training-free controlled generation method like BoxDiff, which I ended up using.

Acephalia commented 7 months ago

Yeah that seems to track from what everyone else has said too the paper seems a little clunky and I was having real trouble even understanding the training bits. Thanks for taking the time to reply.

rezponze commented 7 months ago

Does this mean GLIGEN for SDXL is a dead end?

zcfrank1st commented 7 months ago

any update?

ynie commented 7 months ago
mhillebrand commented 7 months ago

And what about SD3?

camoody1 commented 6 months ago

@Acephalia I've made some attempts earlier but didn't have any luck. I've also seen people struggling to reproduce the training even with the original code on SD1.5, so finally I decided to stop trying. There are some other training-free controlled generation method like BoxDiff, which I ended up using.

What is BoxDiff? I've never heard about this. Would you mind sharing a workflow that makes use of it?

TonyLianLong commented 5 months ago

I just built a codebase called IGLIGEN that I use to train GLIGEN on SDv1.5/2.1. It also supports ModelScope (text-to-video generation). It is trained with SA-1B dataset which is only ~300GB after preprocessing and has 11M images. It is more "modern" (i.e., it is based on diffusers training script) and supports flash attention. The dataset, preprocessing script, and training script are included in the repo. Feel free to contact me for suggestions for this repo.

Repo: https://github.com/TonyLianLong/igligen