<img alt="DiT Tutorial" src="https://github.com/user-attachments/assets/2b0deffa-0181-4676-b79f-fec9b12d8326" width="400">
Trained for 200 epochs
This repository implements DiT in PyTorch for diffusion models. It provides code for the following:
This is very similar to official DiT implementation except the following changes.
conda activate <environment_name>
git clone https://github.com/explainingai-code/DiT-PyTorch.git
cd DiT-PyTorch
pip install -r requirements.txt
models/weights/v0.1/vgg.pth
For setting up on CelebHQ, simply download the images from the official repo of CelebMASK HQ here.
and add it to data
directory.
Ensure directory structure is the following
DiT-PyTorch
-> data
-> CelebAMask-HQ
-> CelebA-HQ-img
-> *.jpg
Allows you to play with different components of DiT and autoencoder
config/celebhq.yaml
- Configuration used for celebhq datasetImportant configuration parameters
autoencoder_acc_steps
: For accumulating gradients if image size is too large and a large batch size cant be used.save_latents
: Enable this to save the latents , during inference of autoencoder. That way DiT training will be fasterThe repo provides training and inference for CelebHQ (Unconditional DiT)
For working on your own dataset:
celebhq.yaml
for guidance)celeb_dataset.py
for guidance Once the config and dataset is setup:
celebhq.yaml
python -m tools.train_vae --config config/celebhq.yaml
for training autoencoder with the desire config filesave_latent
is True
in the configpython -m tools.infer_vae --config config/celebhq.yaml
for generating reconstructions and saving latents with right config file.Train the autoencoder first and setup dataset accordingly.
For training unconditional DiT ensure the right dataset is used in train_vae_dit.py
python -m tools.train_vae_dit --config config/celebhq.yaml
for training unconditional DiT using right configpython -m tools.sample_vae_dit --config config/celebhq.yaml
for generating images using trained DiTOutputs will be saved according to the configuration present in yaml files.
For every run a folder of task_name
key in config will be created
During training of autoencoder the following output will be saved
task_name
directorytask_name/vae_autoencoder_samples
During inference of autoencoder the following output will be saved
task_name
task_name/vae_latent_dir_name
if mentioned in configDuring training and inference of unconditional DiT following output will be saved:
task_name
directorytask_name/samples/*.png
. The final decoded generated image will be x0_0.png
. Images from x0_999.png
to x0_1.png
will be latent image predictions of denoising process from T=999 to T=1. Generated Image is at T=0@misc{peebles2023scalablediffusionmodelstransformers,
title={Scalable Diffusion Models with Transformers},
author={William Peebles and Saining Xie},
year={2023},
eprint={2212.09748},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2212.09748},
}