Closed RichyRich515 closed 2 months ago
cmd args: --xformers --listen --enable-insecure-extension-access --port 7861 --always-gpu --disable-nan-check --cuda-malloc --cuda-stream --pin-shared-memory
Hi there, thanks for the report.
I can't replicate, since I have 24GB VRAM with a RTX 4090 (same as you as well), and when using pin shared memory for example, if changing models (for example X/Y/Z) grid, it unloads the model and then loads another one. If it could get OOM, it would do it with 2 models.
The thing is, if using pin shared memory, it will leave about 2GB as base every time, so it will be at 9-11GB after changing some models, but if doing inference, it should use some VRAM and default to 7-9GB.
The thing is, since the commit you mention until today there was a big bug fix for model management that wasn't affecting 24GB VRAM users too much, but it was managing the VRAM usage really bad for low VRAM GPUs.
This is what I tried with --always-gpu, --cuda-malloc, --cuda-stream, --pin-shared-memory
Just when launching the UI
Changing 1 model (before loading the next model, so basically model unloaded and then reading disk to load into GPU)
Model changed (loaded into GPU)
Changing to another model (before loading the next model, you can notice it uses a bit more VRAM)
Another model loaded
Again, changing to another model
Another model loaded
And since on this point and onwards, the VRAM usage when unloading/loading should still be the same.
Changing to another model (as you can notice, it tops at 6GB)
Another model loaded
Log is
Unloading first loaded model: normals\xlmodels\mikoshiPony_v10.safetensors [b1480f9dc0]...
Loading model normals\xlmodels\rasRealAnimeScreencap_v10.safetensors [ae41e754be] (1 of 1)
Loading weights [ae41e754be] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\rasRealAnimeScreencap_v10.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.01 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.47 seconds
Model normals\xlmodels\rasRealAnimeScreencap_v10.safetensors [ae41e754be] loaded in 122.8s (unload first loaded model if necessary (pinned): 2.9s, load weights from disk: 0.6s, forge load real models: 117.8s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.5s).
Unloading first loaded model: normals\xlmodels\rasRealAnimeScreencap_v10.safetensors [ae41e754be]...
Loading model normals\xlmodels\reweikPonyxl_v012.safetensors [15c6059154] (1 of 1)
Loading weights [15c6059154] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\reweikPonyxl_v012.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.37 seconds
Model normals\xlmodels\reweikPonyxl_v012.safetensors [15c6059154] loaded in 25.6s (unload first loaded model if necessary (pinned): 2.7s, load weights from disk: 0.6s, forge load real models: 21.0s, load VAE: 0.1s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.4s).
Unloading first loaded model: normals\xlmodels\reweikPonyxl_v012.safetensors [15c6059154]...
Loading model normals\xlmodels\susamixPonyV02_v10.safetensors [81721f5112] (1 of 1)
Loading weights [81721f5112] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\susamixPonyV02_v10.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.49 seconds
Model normals\xlmodels\susamixPonyV02_v10.safetensors [81721f5112] loaded in 25.9s (unload first loaded model if necessary (pinned): 2.8s, load weights from disk: 0.5s, forge load real models: 21.2s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.5s).
Unloading first loaded model: normals\xlmodels\susamixPonyV02_v10.safetensors [81721f5112]...
Loading model normals\xlmodels\tPonynai3_v6.safetensors [2b493af7c1] (1 of 1)
Loading weights [2b493af7c1] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\tPonynai3_v6.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.46 seconds
Model normals\xlmodels\tPonynai3_v6.safetensors [2b493af7c1] loaded in 31.6s (unload first loaded model if necessary (pinned): 2.5s, load weights from disk: 0.5s, forge load real models: 27.2s, load VAE: 0.1s, load textual inversion embeddings: 0.7s, calculate empty prompt: 0.5s)
This behavior mostly happens when using cuda stream and pin shared memory.
It is different as OG Forge (before experimental updates) behaved, since had to do some modifications as well to support multipl checkpoints.
You can try without --pin--shared-memory and it should use less VRAM as well, since that option is a bit experimental from the torch implementation. Also you can try without --always-gpu.
Do you have a log when you got OOM?
There isn't much of anything in the log when I do OOM, it just makes my computer crawl to a near halt and I have to slowly navigate to close the terminal.
at this point if I pressed generate I would push into shared mem when VAE decode occurs:
To load target model SDXL
Begin to load 1 model
Reuse 1 loaded models
unload clone 0 False
Moving model(s) has taken 0.00 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00, 6.89it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 32/32 [00:12<00:00, 2.51it/s]
Loading model snowpony_v10.safetensors [d6f941b46b] (2 of 2)███████████████████████████| 32/32 [00:12<00:00, 2.66it/s]
Loading weights [d6f941b46b] from C:\Users\Richard\Desktop\stable-diffusion-webui\models\Stable-diffusion\snowpony_v10.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: C:\Users\Richard\Desktop\stable-diffusion-webui\models\VAE\sdxl_vae-fp16fix.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.59 seconds
Model snowpony_v10.safetensors [d6f941b46b] loaded in 5.8s (forge load real models: 4.7s, load VAE: 0.3s, calculate empty prompt: 0.7s).
With my same settings as before, here is my steps with images:
from start:
generate 1 img
changed ckpt
tried to generate img, oom and computer is very slow
Close terminal
Now without --always-gpu --pin-shared-memory
:
gen 1 img
load 2nd ckpt (seems to be okay vram)
gen img on ckpt 2
load 3rd ckpt (okay still)
I did a little more testing, and I think I've pinpointed the --always-gpu
flag as the offender in my setup.
It is strange that you cannot reproduce, and that it does not occur in older the revision.
Thanks for the update. Yes, I think --always-gpu flag gives more issues than anything. Pin shared memory should do a similar work but without obscene VRAM usage.
I could replicate while updating the backend more to comfy upstream.
Since (I think) it is not related to reForge itself but as how comfy defines this flag, gonna close the issue for now. If you find a problem, you're open to re-open the issue.
Checklist
What happened?
When I switch the checkpoint it does not unload the previous one, this happens on main and dev_upstream. I tried a clean install of main. tried a new venv on dev_upstream as well.
Using the settings menu unload model button does nothing to the VRAM use as well.
Possibly related: https://github.com/Panchovix/stable-diffusion-webui-reForge/issues/114 https://github.com/Panchovix/stable-diffusion-webui-reForge/issues/10
Steps to reproduce the problem
What should have happened?
What browsers do you use to access the UI ?
Mozilla Firefox, Other
Sysinfo
sysinfo-2024-08-11-01-13.json
Console logs
Additional information
I reverted back to commit 365c6d482d332896bc91e28d5dd63fccd0b19b29 as that is what I was using before I tried updating and it did not have this issue. So it was introduced between now and then.