Closed huchenlei closed 6 months ago
This is still not solved oh haha. Recently, it has been found that PuLID is appearing more and more frequently in we-media articles, and many evaluations believe that it exceeds instantID. It should have relatively broad prospects. Hopefully soon it will be stable and easy to use in the WebUI.
Thanks for Great work!
This is still not solved oh haha. Recently, it has been found that PuLID is appearing more and more frequently in we-media articles, and many evaluations believe that it exceeds instantID. It should have relatively broad prospects. Hopefully soon it will be stable and easy to use in the WebUI.
Thanks for Great work!
I submit a temp fix. It seems like the unload logic for PuLID preprocessor has some issues. By preventing these unload, at least we are not leaking vram.
Sorry for the delayed fix. I am prioritizing working on various IC-Light extensions.
This is still not solved oh haha. Recently, it has been found that PuLID is appearing more and more frequently in we-media articles, and many evaluations believe that it exceeds instantID. It should have relatively broad prospects. Hopefully soon it will be stable and easy to use in the WebUI. Thanks for Great work!
I submit a temp fix. It seems like the unload logic for PuLID preprocessor has some issues. By preventing these unload, at least we are not leaking vram.
Thank for your provide!
Sorry for the delayed fix. I am prioritizing working on various IC-Light extensions.
salute!
Having just tested the latest controlnet, the Vram leak seemed to persist, and when I switched to the third reference img the bottom increased from 4G to 6G, triggering the OOM.
Vram is growing by more than 1G at a time(if switching input reference img). reference img all adjusted limited in 768*768,
Environment Controlnet commit hash: 04024b4c82c27ff5a90bbc0401e9f72c7236d564 GPU: 3060-12G platform: Win10 Webui version: 1.6.0
What A1111 config do you have? According to my testing,
What A1111 config do you have? According to my testing,
- SDXL loaded: ~8.5G VRAM
- SDXL loaded + PuLID preprocessors loaded: 11.5G VRAM
- SDXL PuLID inference: 15.8 G VRAM peak
- VRAM goes down to 11.5G after inference
- Change reference image does not further increase stedy state VRAM consumption. It remains at 11.5G.
What A1111 config do you have? According to my testing,
- SDXL loaded: ~8.5G VRAM
- SDXL loaded + PuLID preprocessors loaded: 11.5G VRAM
- SDXL PuLID inference: 15.8 G VRAM peak
- VRAM goes down to 11.5G after inference
- Change reference image does not further increase stedy state VRAM consumption. It remains at 11.5G.
Okay, thank you. I'll run the exact test again tonight and check the vram data.
Supplementary content
@echo off
set PYTHON= set GIT= set VENV_DIR= set COMMANDLINE_ARGS=--xformers --api --no-half-vae --disable-nan-check --medvram-sdxl --skip-version-check --skip-python-version-check --skip-torch-cuda-test --skip-install call webui.bat
I start test in a new environment:3090-24G. following is change of my vram. The whole 24G Vram is used up at the 13th time to switch to a new image(every img is diffrent). The increase rate is about 1.3G each time. I have confirmed the latest version of controlnet, I don't know anything went wrong...
I hope you can check the information I provided in your spare time, and I can provide my own reference image if necessary
I give you my repetition process and hopefully help you locate the problem.
environment Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] webui Version: v1.6.0 Commit hash: 5ef669de080814067961f28357256e8fe27544f4 Launching Web UI with arguments: --xformers --api --no-half-vae --disable-nan-check --medvram-sdxl --skip-version-check --skip-python-version-check --skip-torch-cuda-test --skip-install
contronet commit hash: 3b4eedd90fe8ebcac5363f586157d36dcd9a513f system: Windows10 memory: 47G GPU: 3060/12G
start webui-user.bat finished vram: 3.5/12GB
Generate img with ipadapter params(The same after): _portrait,cinematic,wolf ears,white hair Negative prompt: blurry Steps: 4, Sampler: Euler a, CFG scale: 1.2, Seed: 42, Size: 768x1024, Model hash: e0d996ee00, VAE hash: 63aeecb90f, Clip skip: 2, ENSD: 31337, ControlNet 0: "Module: ip-adapter-auto, Model: ip-adapter_pulid_sdxlfp16 [d86d05ea], Weight: 0.8, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Eta: 0.2, Version: v1.6.0
reference image0: peak Vram: 11.6/12GB finished vram: 4.6/12GB
use another reference image1: peak Vram: 11.7/12GB finished Vram: 5.8/12GB
use another reference image2: peak Vram: 11.8/12GB finished Vram: 6.7/12GB
use another reference image3: Vram OOM _OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 12.00 GiB total capacity; 10.65 GiB already allocated; 0 bytes free; 11.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOCCONF Time taken: 5.6 sec. A: 10.65 GB, R: 11.28 GB, Sys: 12.0/12 GB (100.0%)
As we can see, one generation is fine, but the problem is that the underlying vram increases as a result of constantly changing different input reference image,It looks as if the previous reference has been cached and not cleaned up during subsequent generation. Hope you can easily troubleshoot the problem and Thank you very much for your work!
@huchenlei VRAM leak is still exists
I also found a memory leak when running the webui api. After debugging, I found that each time I run the function 'preprocessor.cached_call ()' (https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlnet.py#L242) an extra memory is used
Can you all try to run the standard repro test case following instruction in https://github.com/Mikubill/sd-webui-controlnet/issues/2891 ?
@huchenlei Thank you for adding the test script. I ran it just now according to my testing. the result is same as manual execution above. finished Vram: X/12GB, this X(maximum vram) is growing and finally OOM. cause no matter what the minimum vram is, the maximum vram is generally almost same as top limit. we should focus on the change of stable vram after every runs. I've tested both on the 3060 & 3090, and it takes more than 13 runs for the 3090-24G to appear in the OOM. if you own 24GB vram. you need more runs to reproduction.
in local 3060-12G, 3 runs OOM appears
in 3090-24G, 13 runs OOM appears
Thank you again for your work and hope the above results will help you locate the problem
The main leak seems to be fixed but there's still something left after using PuLID, about 3 Gb more VRAM is occupied. Usually I have around 7.5 Gb allocated by the main process, after using PuLID once it goes to 10.8 Gb and never deallocates that. Any ways to fix this?
The main leak seems to be fixed but there's still something left after using PuLID, about 3 Gb more VRAM is occupied. Usually I have around 7.5 Gb allocated by the main process, after using PuLID once it goes to 10.8 Gb and never deallocates that. Any ways to fix this?
While I change reference image, Vram occupied increased further,Do you noticed it in your environment?
I only tried with one image, will probably try with multiple later. But so far it's still unusable because it leaves something in my VRAM and it for example causes the program to OOM if I use hires fix. Right after denoising during the VAE decoding phase I believe. So I have to restart it to get the memory back.
I only tried with one image, will probably try with multiple later. But so far it's still unusable because it leaves something in my VRAM and it for example causes the program to OOM if I use hires fix. Right after denoising during the VAE decoding phase I believe. So I have to restart it to get the memory back.
I created a new issue. You can focus on it:https://github.com/Mikubill/sd-webui-controlnet/issues/2905
See discussion here: https://github.com/ToTheBeginning/PuLID/issues/16
There is VRAM leak for PuLID for current impl in sd-webui-controlnet. The main repo's impl does not show that. Transferring the issue here.