Zheng-Chong / CatVTON

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
Other
729 stars 85 forks source link

[AMD GPU] Slower speed and 40G VRAM for app #44

Open xalteropsx opened 2 weeks ago

xalteropsx commented 2 weeks ago

i using bf16 and it occupy more then 40gb vram and very slow

Zheng-Chong commented 2 weeks ago

Running Gardio app takes only about 8G VRAM in our test. Your description about the issue is not sufficient enough. Can you provide more details for me to further check your problem?

xalteropsx commented 2 weeks ago

Running Gardio app takes only about 8G VRAM in our test. Your description about the issue is not sufficient enough. Can you provide more details for me to further check your problem?

sorry for late reply last day was little fever i will do it now will provide u sample video and some changes what i made brb it take 10-30min

xalteropsx commented 2 weeks ago

image as u can see from screenshot i clone ur current repo and change this line

default="runwayml/stable-diffusion-inpainting" - > default="benjamin-paine/stable-diffusion-v1-5-inpainting"

it hard to make video at such large gpu consuming even freezing my display i will reduce the resolution so it can reduce resolution since

1024 * 768 - 42gb nearly let me check what number should be good for 20 vram resolution so i can show u proof

xalteropsx commented 2 weeks ago

@Zheng-Chong https://drive.google.com/file/d/13q1EpdWZ9lJ2PDBA2Bq9snyEpa2wlsjM/view?usp=sharing

Zheng-Chong commented 2 weeks ago

According to the information you provided, you are using the AMD GPU. Machine learning tasks usually require Nvidia GPU to accelerate with CUDA, otherwise it will be performed at a very slow speed. In addition, your 46.5G VRAM is likely to be occupied by other applications. Maybe you can view it through process manager.

xalteropsx commented 2 weeks ago

u can download the video and check it its impossible vram consume by it if u want proof i can google remote video host or google meet or provide me something i can share my screen with u? dont bother about slow speed bro all i want is less vram if possible

Zheng-Chong commented 2 weeks ago

I have checked the video you sent, and indeed there is a 40G memory usage. I suspect the reason is that AMD GPUs do not support low-precision or mixed-precision inference, or the current code cannot effectively utilize the AMD GPUs, leading to increased memory usage and slow speed. However, I have too little experience with machine learning on AMD GPU to provide suggestions for reducing high memory usage. Perhaps you can seek help from relevant communities to see if there are methods to reduce memory consumption and accelerate the process. I have modified the title of this issue so that others can view it and offer assistance.

xalteropsx commented 2 weeks ago

i will try to investigate if i know anything or may be ask someone who using amd for this question as well but was using zluda torch in window which check in linux on pure torch and will let u know the feedback again