Closed Lesteriax closed 3 months ago
Hi there, thanks for the report. Are you on main or dev_upstream branch? Can you send me the commit where the high vram load didn't happen?
Hey @Panchovix, after extensive get checkouts without being able to reproduce, I figured the issue was from my end with the command arguments.
Issue where this happened was using these arguments: --always-gpu --disable-nan-check --disable-xformers --attention-pytorch
Correct arguments I used previously which were perfect: --xformers --always-gpu --disable-nan-check --cuda-malloc --cuda-stream --pin-shared-memory
Not sure if this information might help you but I was able to reproduce it by changing between these two arguments.
Thank you
Maybe something related to SDPA or another cross optimization is doing that issue, but glad you could find out. Gotta have it noted in case it happens again.
Thanks for the update!
I have this issue and was about to post a new issue about it. When I use a Pony model and a Lora, the VRAM shoots up and everything slows way down. I try the exact same parameters and models on the latest Auto1111 and this does not happen. I can only reproduce it with Pony models, which seems... strange? It's 100% reproducible though.
Fails with https://civitai.com/models/458760/bemypony and any Lora. Works fine without a Lora, or with https://civitai.com/models/299933?modelVersionId=638622 with or without a Lora.
Positive: score_9, score_8_up, score_7_up, score_6_up, a medium closeup color portrait photo of mwlexi wearing a bra on a greek island
Cmd line: set COMMANDLINE_ARGS=--api --xformers --always-gpu --disable-nan-check --cuda-stream --pin-shared-memory --cuda-malloc
This is with the dev-upstream branch.
@mweldon
It is interesting it happens only on pony models. What GPU do you have? If it has 12GB VRAM I think --pin-shared-memory and --always-gpu do more harm on this case since it will use a lot more of VRAM to not move the model around (and A1111 doesn't have equivalents args for this)
Wondering, do you get that issue on main branch? dev_upstream has kinda a different model management as well that comes from comfy upstream changes.
Removing --always-gpu fixes it. Thanks.
Also I noticed that Comfy has the same issue so I wonder if there's some command line that I need to change for that too.
Checklist
What happened?
I noticed after the last git pull that when I load any lora, my GPU memory double as opposed to 2 days ago for example. I usually load two models in Vram and I used to generate normally with no issues but after the last git pull, I started getting out of memory.
I will provide below 3 pictures to show how the GPU loads
1- This is a fresh start, 1 model is loaded, no loras used (Here, I can generate and the gpu memeory usage would revert back normally after cleanup automatically)
2- This is when I generated by adding 1 lora, notice that gpu has doubled and did not revert back to gpu load
3- Here is what happens after I removed the lora, it reverted back or reduced gpu memory usage by a lot
Steps to reproduce the problem
.
What should have happened?
.
What browsers do you use to access the UI ?
No response
Sysinfo
.
Console logs
Additional information
No response