belider / cloth-virtual-try-on-using-stable-diffusion

An experiment to use Stable Diffusion for cloth virtual try-on task. Repo contains modified Dreambooth training script
71 stars 11 forks source link

Cloth virtual try-on using Stable Diffusion

I experimented with training Stable Diffusion image generation model for cloth virtual try on task.

Problem

State of the art solutions [1] use GAN architecture and mostly doesn’t seem to work great on an out of the dataset photos.

Here's an example of using a GAN model from HR-VITON paper with pre-trained weights from the paper's GitHub repository.

Person image from a training dataset Custom cloth image GAN try-on result
Person image from a training dataset Custom cloth image GAN try-on result

Diffusion model results

I trained Stable Diffusion Inpainting model on two clothes:

Diffusion try-on result 1 Diffusion try-on result 2
Diffusion try-on result 1 Diffusion try-on result 2

The model accurately reproduced the details of a gray sweater and adjusted the lighting of the photo. However, it did not accurately generate the print of the sweatshirt, though it was close. I couldn't resolve this issue with the current version of Stable Diffusion (2.1).

Training details

I used a dreambooth inpainting training script and modified it to apply custom cloth masks and prompts.

I used 1000 training steps, learning rate 5e-6, no regularization images, trained text encoder.

Person image Hand-drawn person mask Diffusion try-on results, A-pose
Person image Hand-drawn person mask Diffusion try-on results, A-pose

As you can imagine, it's pretty inconvenient to hand-draw a mask. So, I came up with a way to generate it automatically; maybe I'll share it later.

Person Image Auto-generated person mask Diffusion try-on results, complex pose
Person Image Auto-generated person mask Diffusion try-on results, complex pose

The model was able to generate clothes correctly even in a complex pose.

How to run training yourself

Check dreambooth repo for a reference.

Install dependencies with pip install -r requirements.txt

Create cloth images directory and masks directory with images.

Note that my training script will generate prompts from the image names. For example: “t-shirt (1).jpg” → “t-shirt”

Run training:

accelerate launch train_dreambooth_inpaint_my_prompts.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-inpainting"  \
  --instance_data_dir=$IMAGES_DIR \
  --instance_masks_data_dir=$MASKS_DIR \
  --output_dir=$OUTPUT_WEIGHTS_DIR \
  --train_text_encoder \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 --gradient_checkpointing \
  --use_8bit_adam \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=1000 \
  --checkpointing_steps=500

Conclusion

Clothes with simple forms and textures work best today, while logos and prints are very hard to reproduce correctly with stable diffusion 2.1.

Despite these limitations, generative models will definitely unlock photorealistic try-on!

DM me, if you’re interested in collaborating on this challenge.