[Bug]: Checkpoint not unloading when changing in WebUI, causing OOM

RichyRich515 commented 2 months ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[X] The issue has been reported before but has not been fixed yet

What happened?

When I switch the checkpoint it does not unload the previous one, this happens on main and dev_upstream. I tried a clean install of main. tried a new venv on dev_upstream as well.

Using the settings menu unload model button does nothing to the VRAM use as well.

Steps to reproduce the problem

Launch WebUI
Generate image
Note Dedicated GPU memory (in task manager)
Change the current Checkpoint from the dropdown selector
Observe Dedicated GPU memory increase
Attempt to generate image
Observe Shared GPU memory increase

What should have happened?

Launch WebUI
Generate image
Note Dedicated GPU memory (in task manager)
Change the current Checkpoint from the dropdown selector
Observe Dedicated GPU memory stay the same
Attempt to generate image
Observe Shared GPU memory stay the same

What browsers do you use to access the UI ?

Mozilla Firefox, Other

Sysinfo

sysinfo-2024-08-11-01-13.json

Console logs

To load target model SDXL
Begin to load 1 model
Reuse 1 loaded models
unload clone 0 False
Moving model(s) has taken 0.00 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  7.36it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 32/32 [00:12<00:00,  2.62it/s]
Unloading first loaded model: snowpony_v10.safetensors [d6f941b46b]...█████████████████| 32/32 [00:12<00:00,  2.99it/s]
Loading model autismmixSDXL_autismmixConfetti.safetensors [ac006fdd7e] (1 of 1)
Loading weights [ac006fdd7e] from C:\Users\Richard\Desktop\stable-diffusion-webui\models\Stable-diffusion\autismmixSDXL_autismmixConfetti.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.03 seconds
Loading VAE weights specified in settings: C:\Users\Richard\Desktop\stable-diffusion-webui\models\VAE\sdxl_vae-fp16fix.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.44 seconds
Model autismmixSDXL_autismmixConfetti.safetensors [ac006fdd7e] loaded in 11.2s (unload first loaded model if necessary (pinned): 5.6s, forge load real models: 4.7s, load VAE: 0.2s, calculate empty prompt: 0.5s).
Unloading first loaded model: autismmixSDXL_autismmixConfetti.safetensors [ac006fdd7e]...
Loading model az_aaaautismPonyFinetune_aaaaReStart.safetensors [44eb8c883d] (1 of 1)
Loading weights [44eb8c883d] from C:\Users\Richard\Desktop\stable-diffusion-webui\models\Stable-diffusion\az_aaaautismPonyFinetune_aaaaReStart.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']

Additional information

I reverted back to commit 365c6d482d332896bc91e28d5dd63fccd0b19b29 as that is what I was using before I tried updating and it did not have this issue. So it was introduced between now and then.

RichyRich515 commented 2 months ago

cmd args: --xformers --listen --enable-insecure-extension-access --port 7861 --always-gpu --disable-nan-check --cuda-malloc --cuda-stream --pin-shared-memory

Panchovix commented 2 months ago

Hi there, thanks for the report.

I can't replicate, since I have 24GB VRAM with a RTX 4090 (same as you as well), and when using pin shared memory for example, if changing models (for example X/Y/Z) grid, it unloads the model and then loads another one. If it could get OOM, it would do it with 2 models.

The thing is, if using pin shared memory, it will leave about 2GB as base every time, so it will be at 9-11GB after changing some models, but if doing inference, it should use some VRAM and default to 7-9GB.

The thing is, since the commit you mention until today there was a big bug fix for model management that wasn't affecting 24GB VRAM users too much, but it was managing the VRAM usage really bad for low VRAM GPUs.

This is what I tried with --always-gpu, --cuda-malloc, --cuda-stream, --pin-shared-memory

Just when launching the UI

imagen

Changing 1 model (before loading the next model, so basically model unloaded and then reading disk to load into GPU)

imagen

Model changed (loaded into GPU)

imagen

Changing to another model (before loading the next model, you can notice it uses a bit more VRAM)

imagen

Another model loaded

imagen

Again, changing to another model

imagen

Another model loaded

imagen

And since on this point and onwards, the VRAM usage when unloading/loading should still be the same.

Changing to another model (as you can notice, it tops at 6GB)

imagen

Another model loaded

imagen

Log is

Unloading first loaded model: normals\xlmodels\mikoshiPony_v10.safetensors [b1480f9dc0]...
Loading model normals\xlmodels\rasRealAnimeScreencap_v10.safetensors [ae41e754be] (1 of 1)
Loading weights [ae41e754be] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\rasRealAnimeScreencap_v10.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.01 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.47 seconds
Model normals\xlmodels\rasRealAnimeScreencap_v10.safetensors [ae41e754be] loaded in 122.8s (unload first loaded model if necessary (pinned): 2.9s, load weights from disk: 0.6s, forge load real models: 117.8s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.5s).
Unloading first loaded model: normals\xlmodels\rasRealAnimeScreencap_v10.safetensors [ae41e754be]...
Loading model normals\xlmodels\reweikPonyxl_v012.safetensors [15c6059154] (1 of 1)
Loading weights [15c6059154] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\reweikPonyxl_v012.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.37 seconds
Model normals\xlmodels\reweikPonyxl_v012.safetensors [15c6059154] loaded in 25.6s (unload first loaded model if necessary (pinned): 2.7s, load weights from disk: 0.6s, forge load real models: 21.0s, load VAE: 0.1s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.4s).
Unloading first loaded model: normals\xlmodels\reweikPonyxl_v012.safetensors [15c6059154]...
Loading model normals\xlmodels\susamixPonyV02_v10.safetensors [81721f5112] (1 of 1)
Loading weights [81721f5112] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\susamixPonyV02_v10.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.49 seconds
Model normals\xlmodels\susamixPonyV02_v10.safetensors [81721f5112] loaded in 25.9s (unload first loaded model if necessary (pinned): 2.8s, load weights from disk: 0.5s, forge load real models: 21.2s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.5s).
Unloading first loaded model: normals\xlmodels\susamixPonyV02_v10.safetensors [81721f5112]...
Loading model normals\xlmodels\tPonynai3_v6.safetensors [2b493af7c1] (1 of 1)
Loading weights [2b493af7c1] from G:\Stable difussion\stable-diffusion-webui-reForge\models\Stable-diffusion\normals\xlmodels\tPonynai3_v6.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: G:\Stable difussion\stable-diffusion-webui-reForge\models\VAE\sdxl_vae_fixedfp16.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.46 seconds
Model normals\xlmodels\tPonynai3_v6.safetensors [2b493af7c1] loaded in 31.6s (unload first loaded model if necessary (pinned): 2.5s, load weights from disk: 0.5s, forge load real models: 27.2s, load VAE: 0.1s, load textual inversion embeddings: 0.7s, calculate empty prompt: 0.5s)

This behavior mostly happens when using cuda stream and pin shared memory.

It is different as OG Forge (before experimental updates) behaved, since had to do some modifications as well to support multipl checkpoints.

You can try without --pin--shared-memory and it should use less VRAM as well, since that option is a bit experimental from the torch implementation. Also you can try without --always-gpu.

Do you have a log when you got OOM?

RichyRich515 commented 2 months ago

There isn't much of anything in the log when I do OOM, it just makes my computer crawl to a near halt and I have to slowly navigate to close the terminal.

at this point if I pressed generate I would push into shared mem when VAE decode occurs:

To load target model SDXL
Begin to load 1 model
Reuse 1 loaded models
unload clone 0 False
Moving model(s) has taken 0.00 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  6.89it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 32/32 [00:12<00:00,  2.51it/s]
Loading model snowpony_v10.safetensors [d6f941b46b] (2 of 2)███████████████████████████| 32/32 [00:12<00:00,  2.66it/s]
Loading weights [d6f941b46b] from C:\Users\Richard\Desktop\stable-diffusion-webui\models\Stable-diffusion\snowpony_v10.safetensors
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection', 'clip_g.transformer.text_model.embeddings.position_ids']
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 0.02 seconds
Loading VAE weights specified in settings: C:\Users\Richard\Desktop\stable-diffusion-webui\models\VAE\sdxl_vae-fp16fix.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.59 seconds
Model snowpony_v10.safetensors [d6f941b46b] loaded in 5.8s (forge load real models: 4.7s, load VAE: 0.3s, calculate empty prompt: 0.7s).

With my same settings as before, here is my steps with images:

from start:

generate 1 img

changed ckpt

tried to generate img, oom and computer is very slow

Close terminal

RichyRich515 commented 2 months ago

Now without --always-gpu --pin-shared-memory:

gen 1 img

load 2nd ckpt (seems to be okay vram)

gen img on ckpt 2

load 3rd ckpt (okay still)

RichyRich515 commented 2 months ago

I did a little more testing, and I think I've pinpointed the --always-gpu flag as the offender in my setup. It is strange that you cannot reproduce, and that it does not occur in older the revision.

Panchovix commented 2 months ago

Thanks for the update. Yes, I think --always-gpu flag gives more issues than anything. Pin shared memory should do a similar work but without obscene VRAM usage.

I could replicate while updating the backend more to comfy upstream.

Since (I think) it is not related to reForge itself but as how comfy defines this flag, gonna close the issue for now. If you find a problem, you're open to re-open the issue.

Panchovix / stable-diffusion-webui-reForge