Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
17.07k stars 1.96k forks source link

[Known issue] VRAM leak for PuLID #2862

Closed huchenlei closed 6 months ago

huchenlei commented 6 months ago

See discussion here: https://github.com/ToTheBeginning/PuLID/issues/16

There is VRAM leak for PuLID for current impl in sd-webui-controlnet. The main repo's impl does not show that. Transferring the issue here.

inspire-boy commented 6 months ago

This is still not solved oh haha. Recently, it has been found that PuLID is appearing more and more frequently in we-media articles, and many evaluations believe that it exceeds instantID. It should have relatively broad prospects. Hopefully soon it will be stable and easy to use in the WebUI.

Thanks for Great work!

huchenlei commented 6 months ago

This is still not solved oh haha. Recently, it has been found that PuLID is appearing more and more frequently in we-media articles, and many evaluations believe that it exceeds instantID. It should have relatively broad prospects. Hopefully soon it will be stable and easy to use in the WebUI.

Thanks for Great work!

I submit a temp fix. It seems like the unload logic for PuLID preprocessor has some issues. By preventing these unload, at least we are not leaking vram.

huchenlei commented 6 months ago

Sorry for the delayed fix. I am prioritizing working on various IC-Light extensions.

inspire-boy commented 6 months ago

This is still not solved oh haha. Recently, it has been found that PuLID is appearing more and more frequently in we-media articles, and many evaluations believe that it exceeds instantID. It should have relatively broad prospects. Hopefully soon it will be stable and easy to use in the WebUI. Thanks for Great work!

I submit a temp fix. It seems like the unload logic for PuLID preprocessor has some issues. By preventing these unload, at least we are not leaking vram.

Thank for your provide!

inspire-boy commented 6 months ago

Sorry for the delayed fix. I am prioritizing working on various IC-Light extensions.

salute!

inspire-boy commented 6 months ago

Having just tested the latest controlnet, the Vram leak seemed to persist, and when I switched to the third reference img the bottom increased from 4G to 6G, triggering the OOM.

Vram is growing by more than 1G at a time(if switching input reference img). reference img all adjusted limited in 768*768,

Environment Controlnet commit hash: 04024b4c82c27ff5a90bbc0401e9f72c7236d564 GPU: 3060-12G platform: Win10 Webui version: 1.6.0

huchenlei commented 6 months ago

What A1111 config do you have? According to my testing,

inspire-boy commented 6 months ago

What A1111 config do you have? According to my testing,

  • SDXL loaded: ~8.5G VRAM
  • SDXL loaded + PuLID preprocessors loaded: 11.5G VRAM
  • SDXL PuLID inference: 15.8 G VRAM peak
  • VRAM goes down to 11.5G after inference
  • Change reference image does not further increase stedy state VRAM consumption. It remains at 11.5G.

881 882 883 884 885

inspire-boy commented 6 months ago

What A1111 config do you have? According to my testing,

  • SDXL loaded: ~8.5G VRAM
  • SDXL loaded + PuLID preprocessors loaded: 11.5G VRAM
  • SDXL PuLID inference: 15.8 G VRAM peak
  • VRAM goes down to 11.5G after inference
  • Change reference image does not further increase stedy state VRAM consumption. It remains at 11.5G.

Okay, thank you. I'll run the exact test again tonight and check the vram data.

inspire-boy commented 6 months ago

Supplementary content

@echo off

set PYTHON= set GIT= set VENV_DIR= set COMMANDLINE_ARGS=--xformers --api --no-half-vae --disable-nan-check --medvram-sdxl --skip-version-check --skip-python-version-check --skip-torch-cuda-test --skip-install call webui.bat

inspire-boy commented 6 months ago

I start test in a new environment:3090-24G. following is change of my vram. The whole 24G Vram is used up at the 13th time to switch to a new image(every img is diffrent). The increase rate is about 1.3G each time. I have confirmed the latest version of controlnet, I don't know anything went wrong...

I hope you can check the information I provided in your spare time, and I can provide my own reference image if necessary

901 902

inspire-boy commented 6 months ago

image

inspire-boy commented 6 months ago

I give you my repetition process and hopefully help you locate the problem.

environment Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] webui Version: v1.6.0 Commit hash: 5ef669de080814067961f28357256e8fe27544f4 Launching Web UI with arguments: --xformers --api --no-half-vae --disable-nan-check --medvram-sdxl --skip-version-check --skip-python-version-check --skip-torch-cuda-test --skip-install

contronet commit hash: 3b4eedd90fe8ebcac5363f586157d36dcd9a513f system: Windows10 memory: 47G GPU: 3060/12G

Step 1:

start webui-user.bat finished vram: 3.5/12GB

Step 2:

Generate img with ipadapter params(The same after): _portrait,cinematic,wolf ears,white hair Negative prompt: blurry Steps: 4, Sampler: Euler a, CFG scale: 1.2, Seed: 42, Size: 768x1024, Model hash: e0d996ee00, VAE hash: 63aeecb90f, Clip skip: 2, ENSD: 31337, ControlNet 0: "Module: ip-adapter-auto, Model: ip-adapter_pulid_sdxlfp16 [d86d05ea], Weight: 0.8, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Eta: 0.2, Version: v1.6.0

reference image0: 0 peak Vram: 11.6/12GB finished vram: 4.6/12GB

Step 3:

use another reference image1: 1 peak Vram: 11.7/12GB finished Vram: 5.8/12GB

Step 4:

use another reference image2: 2 peak Vram: 11.8/12GB finished Vram: 6.7/12GB

Step 5:

use another reference image3: 3 Vram OOM _OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 12.00 GiB total capacity; 10.65 GiB already allocated; 0 bytes free; 11.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOCCONF Time taken: 5.6 sec. A: 10.65 GB, R: 11.28 GB, Sys: 12.0/12 GB (100.0%)

Summary

As we can see, one generation is fine, but the problem is that the underlying vram increases as a result of constantly changing different input reference image,It looks as if the previous reference has been cached and not cleaned up during subsequent generation. Hope you can easily troubleshoot the problem and Thank you very much for your work!

akk-123 commented 6 months ago

@huchenlei VRAM leak is still exists

Daming-TF commented 6 months ago

I also found a memory leak when running the webui api. After debugging, I found that each time I run the function 'preprocessor.cached_call ()' (https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlnet.py#L242) an extra memory is used

huchenlei commented 6 months ago

Can you all try to run the standard repro test case following instruction in https://github.com/Mikubill/sd-webui-controlnet/issues/2891 ?

inspire-boy commented 6 months ago

@huchenlei Thank you for adding the test script. I ran it just now according to my testing. the result is same as manual execution above. finished Vram: X/12GB, this X(maximum vram) is growing and finally OOM. cause no matter what the minimum vram is, the maximum vram is generally almost same as top limit. we should focus on the change of stable vram after every runs. I've tested both on the 3060 & 3090, and it takes more than 13 runs for the 3090-24G to appear in the OOM. if you own 24GB vram. you need more runs to reproduction.

in local 3060-12G, 3 runs OOM appears 222 333 444

in 3090-24G, 13 runs OOM appears 3090

Thank you again for your work and hope the above results will help you locate the problem

rkfg commented 6 months ago

The main leak seems to be fixed but there's still something left after using PuLID, about 3 Gb more VRAM is occupied. Usually I have around 7.5 Gb allocated by the main process, after using PuLID once it goes to 10.8 Gb and never deallocates that. Any ways to fix this?

inspire-boy commented 6 months ago

The main leak seems to be fixed but there's still something left after using PuLID, about 3 Gb more VRAM is occupied. Usually I have around 7.5 Gb allocated by the main process, after using PuLID once it goes to 10.8 Gb and never deallocates that. Any ways to fix this?

While I change reference image, Vram occupied increased further,Do you noticed it in your environment?

rkfg commented 6 months ago

I only tried with one image, will probably try with multiple later. But so far it's still unusable because it leaves something in my VRAM and it for example causes the program to OOM if I use hires fix. Right after denoising during the VAE decoding phase I believe. So I have to restart it to get the memory back.

inspire-boy commented 6 months ago

I only tried with one image, will probably try with multiple later. But so far it's still unusable because it leaves something in my VRAM and it for example causes the program to OOM if I use hires fix. Right after denoising during the VAE decoding phase I believe. So I have to restart it to get the memory back.

I created a new issue. You can focus on it:https://github.com/Mikubill/sd-webui-controlnet/issues/2905