360CVGroup / HiCo_T2I

Precise layout controlled image generation
https://360cvgroup.github.io/HiCo_T2I/
28 stars 0 forks source link
image-generation layout-control

👉 HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

💥 NeurIPS 2024!

Bo Cheng, Yuhang Ma, Liebucha Wu, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Dawei Leng†, Yuhui Yin(✝Corresponding Author)


🔥 News

🕓 Schedules

💻 Quick Demos

Image demos can be found on the HiCo. Some of them are contributed by the community. You can customize your own personalized generation using the following reasoning code.

🔧 Quick Start

0. Experimental environment

We tested our inference code on a machine with a 24GB 3090 GPU and CUDA environment version 12.1.

1. Setup repository and environment

git clone https://github.com/360CVGroup/HiCo_T2I.git
cd HiCo
conda create -n HiCo python=3.10
conda activate HiCo
pip install -r requirements.txt
cd diffusers
pip install .

2. Prepare the models

git lfs install

# HiCo checkpoint
git clone https://huggingface.co/qihoo360/HiCo_T2I models/controlnet

# stable-diffusion-v1-5
git clone https://huggingface.co/krnl/realisticVisionV51_v51VAE models/realisticVisionV51_v51VAE

3. Customize your own creation

CUDA_VISIBLE_DEVICES=0   infer-avg.py

🔥 Train

The json structure for dataset is:

dataset

├──base_info 
│  ├──id
│  ├──width
│  ├──height
│  ├──f_path
├──caption  
├──obj_nums  
├──img_size  
│  ├──H
│  ├──W
├──path_img (f_path)
├──list_bbox_info
│  ├──subcaption
│  ├──coordinates(x1,y1,x2,y2)
│  │......
├──crop_location

Then you can train the code.

accelerate launch train_hico.py

BibTeX

@misc{cheng2024hicohierarchicalcontrollablediffusion,
      title={HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation}, 
      author={Bo Cheng and Yuhang Ma and Liebucha Wu and Shanyuan Liu and Ao Ma and Xiaoyu Wu and Dawei Leng and Yuhui Yin},
      year={2024},
      eprint={2410.14324},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.14324}, 
}

License

This project is licensed under the Apache License (Version 2.0).