Zheng-Chong / CatVTON

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
Other
945 stars 114 forks source link

Any tips on speeding up the image generation? #76

Open JaypTookMyJayp opened 3 weeks ago

JaypTookMyJayp commented 3 weeks ago

my catvton takes about 12 seconds to generate an image on A100 GPU. this is of course when the catvton's models are already loaded. I'm using 512 * 768 image for both clothes and a model, so not a big image.

I am using fp16 precision. bf16 is bad in quality, so I can't go lower. I did try torch.compile but it didn't help at all. I'm using 25 steps & cfg 2.8. seems like this is the lowest I can get considering the quality.

how can I turn this into ~5 seconds process?

or at least may I get to know which process of CatVTON actually takes up the most time?

Zheng-Chong commented 2 weeks ago

It is difficult to effectively compress the model's time consumption for training-free strategy.