feizc / Dimba

Transformer-Mamba Diffusion Models
75 stars 5 forks source link

🚀 Dimba: Transformer-Mamba Diffusion Models

This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper Transformer-Mamba Diffusion Models. You can find more visualizations on our project page.

TL; DR: Dimba is a new text-to-image diffusion model that employs a hybrid architecture combining Transformer and Mamba elements, thus capitalizing on the advantages of both architectural paradigms.

some generated cases.

1. Environments

2. Download Models

Models reported in paper can be directly dounloaded as follows (Urgent upload in progress):

Model #Params url
t5 4.3B huggingface
vae 80M huggingface
Dimba-L-512 0.9B huggingface
Dimba-L-1024 0.9B -
Dimba-L-2048 0.9B -
Dimba-G-512 1.8B -
Dimba-G-1024 1.8B -

The datasets used to quality tuning for aesthetic performance enhancement can be download as:

Dataset Size url
Quality tuning 600k huggingface

3. Inference

We include a inference script which samples images from a Dimba model accroding to textual prompts. It supports DDIM and dpm-solver sampling algorithm. You can run the scripts as:

python scripts/inference.py \
--image_size 512 \
--model_version dimba-l \
--model_path /path/to/model \
--txt_file asset/examples.txt \
--save_path /path/to/save/results

4. Training

We provide a training script for Dimba in scripts/train.py. This script can be used to fine-tuning with different settings. You can run the scripts as:

python -m torch.distributed.launch --nnodes=4 --nproc_per_node=8 \
    --master_port=1234 scripts/train.py \
    configs/dimba_xl2_img512.py \
    --work-dir outputs

5. BibTeX

    title={Dimba: Transformer-Mamba Diffusion Models}, 
    author={Zhengcong Fei and Mingyuan Fan and Changqian Yu and Debang Li and Youqiang Zhang and Junshi Huang},

6. Acknowledgments

The codebase is based on the awesome PixArt, Vim, and DiS repos.

The Dimba paper is polished with ChatGPT using prompt.