CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
my catvton takes about 12 seconds to generate an image on A100 GPU.
this is of course when the catvton's models are already loaded.
I'm using 512 * 768 image for both clothes and a model, so not a big image.
I am using fp16 precision. bf16 is bad in quality, so I can't go lower.
I did try torch.compile but it didn't help at all.
I'm using 25 steps & cfg 2.8. seems like this is the lowest I can get considering the quality.
how can I turn this into ~5 seconds process?
or at least may I get to know which process of CatVTON actually takes up the most time?
my catvton takes about 12 seconds to generate an image on A100 GPU. this is of course when the catvton's models are already loaded. I'm using 512 * 768 image for both clothes and a model, so not a big image.
I am using fp16 precision. bf16 is bad in quality, so I can't go lower. I did try torch.compile but it didn't help at all. I'm using 25 steps & cfg 2.8. seems like this is the lowest I can get considering the quality.
how can I turn this into ~5 seconds process?
or at least may I get to know which process of CatVTON actually takes up the most time?