realistic VRAM usage | memory exhaustion with two ipadapters in a row

abcnorio commented 3 weeks ago

Hello together,

can someone give me a realistic view on ipadapter-plus VRAM usage?

I use a rtx 3060 12 GB VRAM, on Debian stable, conda env. Everything works fine, but using

sdxl checkpoint
two ipadapters in a row (with faceid, but: what can be pushed to CPU is pushed to CPU and not GPU)
reference image 1024x1024, same true for empty latent image
using ipadapter pipeline to prevent redundant loadings

...and observing with 'nvtop' the VRAM usage while running the workflow shows this already exhausts the memory. Running the workflow twice or adding something like "rescale cfg" node after second ipadapter and before the ksampler and the workflow crashes with memory overflow error ("allocation device error"). This is a reliable observation on my system. Speed itself is fine compared to other activities with comfyui.

Questions:

1 - Assumed this is normal? ie. just not enough GPU VRAM?

2 - Any chance - except for buying a bigger GPU or switching to SD15 - to handle that? Using SD15 prevents memory exhaustion but does not create such good images like sdxl checkpoints do. The different quality between sdxl and SD15 is obvious.

3 - Any workarounds? Something like 'tiles' (as it is used for upscaling) won't work, right?

4- Any chance to clear the cache in-between? Probably not, the models want to be used, right?

5- Reducing the size of the reference image won't work as well as it seems to be related to sdxl, right?

Thanks for an answer!

cubiq commented 3 weeks ago

SDXL is certainly more demanding but 12gb should be plenty enough. Maybe there's something else at play but it's hard to say without a full diagnostic. Start by posting the startup comfyui message

abcnorio commented 3 weeks ago

Thanks, here are the starting messages:

comfyui_startingmessages.txt

It complains about upgrading 'inference' but I am pretty cautious normally before doing anything (experienced too many breaks of comfyui + plugins after upgrading packages).

...and here is nvtop output along with the workflow:

ipadapter_#4_2024-06-08 11-45-05

It does not crash every time, here - see screenshot - it went well, but it is obvious that the memory is at its limits.

cubiq commented 3 weeks ago

mh I would probably try to upgrade to at least python 3.11. you can try to disable xformers (some nodes might need it, but just as a test).

also disable the nodes that you don't use often

abcnorio commented 3 weeks ago

ok will do that, thanks, requires time to come back as I have to clone the conda env - let's see whether with conda env export, new python env 3.11, re-import it works. And will start with a fresh comfyui under python 3.11 and then add only ipadapters (all models etc. are symlinked + hardlinked, so that's easy to attain). And then try again and observe 'nvtop'.

abcnorio commented 3 weeks ago

Hallo,

did a fresh comfyui-from-scratch under python 3.11 with no-xformers and only a minimal amount of nodes to get the workflow going. No change, the process of VRAM consumption stays exactly the same.

Does anyone uses a 12 GB VRAM nvidia card like the 3060 and can confirm my finding? Just want to be sure not to have overseen anything. Or - the VRAM usage should also be the same for cards with higher amounts of VRAM, see screenshot from previous posting - or does the following works for someone with less than 12 GB VRAM?

sdxl model like thinkdiffusion
reference image = 1024x1024, target resolution = the same
instantID + 2x ipadapters before ksampler + image preparation (crop, resize, ...)
VRAM consumption should be near 12 GB VRAM when the ksampler starts

Thanks.

cubiq commented 3 weeks ago

what version of protobuf do you have?

abcnorio commented 3 weeks ago

$ conda list|grep protobuf protobuf 4.25.3 pypi_0 pypi

conda_env_python311.txt

all packages installed via 'pip install -r requirements.txt' etc. ie. no manual install

bidlake commented 3 weeks ago

I also have an Nvidia Geforce 3060 with 12 GB Vram and had a similar problem. After I updated Python to version 3.11.8, deactivated Xformers and set pytorch cross attention, I hardly have any problems anymore. This means that I can now run five IP adapters in a sequence at the standard SDXL resolution. And maybe it helps to enlarge the swap file, i.e. the virtual memory. I have set mine to 80 GB.

abcnorio commented 3 weeks ago

Thanks, that's a good hint - will create a larger swap file + use the following two cmd switches?

- -use-pytorch-cross-attention
--disable-xformers

Python is already on 3.11.9.

cubiq commented 3 weeks ago

upgrade protobuf, thank me later

PS: there are some libraries that rely on the old version (eg: mediapipe)

abcnorio commented 2 weeks ago

Thanks @cubiq, but did not bring the same result as for @bidlake.

So did:

updated protobuf to 5.27.1
started with and without "-use-pytorch-cross-attention --disable-xformers" - doesn't make any difference
added swap file (100G) without any effect (not touched at all)

.json file: example_ipadapter-memoryissue_workflow_v1.json

Pls see attached .json workflow. At least now it is possible to run:

instantID + 3 ipadapters
if another ipadapter or rescale-cfg etc. is enabled (see json), it crashes with memory error
disabling instantID and enabling another ipadapter crashes too (so CPU usage seems to be ok for instantID)

@bidlake - maybe you can be so kind to run the json workflow on your computer + enable all nodes (pls ignore the workflow as it is may not make much sense...), input is a a quadratic portrait + pls tell me whether it runs for you. If it does, can you pls post:

'conda list' with version numbers? Assumed is your comfyui call is similar to 'python main.py' + use-crossattention/disable-xformers.
What *nix OS are you on? Assumed official nvidia drivers from repo? I use Debian trixie, nvidia-smi output at the end below. But as I use conda env only the nvidia drivers should have some influence.
If you have a workflow at hand that works for you, can you post it please so I can try it here? Thanks a lot!

So assumed that's the reality?

Anyway - here are the error messages:

Error occurred when executing RescaleCFG:

Allocation on device

File "/ComfyUI/execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/comfy_extras/nodes_model_advanced.py", line 208, in patch
m = model.clone()
^^^^^^^^^^^^^
File "/ComfyUI/comfy/model_patcher.py", line 90, in clone
n.model_options = copy.deepcopy(self.model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 206, in _deepcopy_list
append(deepcopy(a, memo))
^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 146, in deepcopy
y = copier(x, memo)
^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/copy.py", line 153, in deepcopy
y = copier(memo)
^^^^^^^^^^^^
File "/miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/site-packages/torch/nn/parameter.py", line 59, in __deepcopy__
result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad)

torch.cuda.OutOfMemoryError: Allocation on device

protobuf version (now):

$ pip show protobuf
Name: protobuf
Version: 5.27.1
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: protobuf@googlegroups.com
Author-email: protobuf@googlegroups.com
License: 3-Clause BSD License
Location: /miniconda3/envs/cuisdxl-trixie-311/lib/python3.11/site-packages
Requires: 
Required-by: mediapipe, onnx, onnxruntime, onnxruntime-gpu

comfyui startup messages:

$python main.py --use-pytorch-cross-attention --disable-xformers
[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-06-10 09:52:30.847832
** Platform: Linux
** Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
** Python executable: /miniconda3/envs/cuisdxl-trixie-311/bin/python3
** Log path: /ComfyUI/comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: /ComfyUI/custom_nodes/rgthree-comfy
   0.5 seconds: /ComfyUI/custom_nodes/ComfyUI-Manager

Total VRAM 12044 MB, total RAM 64095 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using pytorch cross attention
------------------------------------------
Comfyroll Studio v1.76 :  175 Nodes Loaded
------------------------------------------
** For changes, please see patch notes at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/blob/main/Patch_Notes.md
** For help, please see the wiki at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/wiki
------------------------------------------
### Loading: ComfyUI-Manager (V2.37.1)
### ComfyUI Revision: 2229 [6cd8ffc4] | Released on '2024-06-08'
### Loading: ComfyUI-Impact-Pack (V5.11.4)
### Loading: ComfyUI-Impact-Pack (Subpack: V0.5)
[Impact Pack] Wildcards loading done.
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
WAS Node Suite: OpenCV Python FFMPEG support is enabled
WAS Node Suite Warning: `ffmpeg_bin_path` is not set in `/ComfyUI/custom_nodes/was-node-suite-comfyui/was_suite_config.json` config file. Will attempt to use system ffmpeg binaries if available.
WAS Node Suite: Finished. Loaded 213 nodes successfully.

    "Art is the bridge that connects imagination to reality." - Unknown

[Crystools INFO] Crystools version: 1.12.0
[Crystools INFO] CPU: 12th Gen Intel(R) Core(TM) i5-12400 - Arch: x86_64 - OS: Linux 6.7.12-amd64
[Crystools INFO] GPU/s:
[Crystools INFO] 0) NVIDIA GeForce RTX 3060
[Crystools INFO] NVIDIA Driver: 535.161.08
[comfyui_controlnet_aux] | INFO -> Using ckpts path: /ComfyUI/custom_nodes/comfyui_controlnet_aux/ckpts
[comfyui_controlnet_aux] | INFO -> Using symlinks: False
[comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
DWPose: Onnxruntime with acceleration providers detected

[rgthree] Loaded 39 magnificent nodes.
[rgthree] Will use rgthree's optimized recursive execution.

### Loading: ComfyUI-Inspire-Pack (V0.75.2)

Import times for custom nodes:
   0.0 seconds: /ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /ComfyUI/custom_nodes/ComfyUI_yanc
   0.0 seconds: /ComfyUI/custom_nodes/ComfyUI_essentials
   0.0 seconds: /ComfyUI/custom_nodes/comfyui_controlnet_aux
   0.0 seconds: /ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
   0.0 seconds: /ComfyUI/custom_nodes/ComfyUI-Manager
   0.0 seconds: /ComfyUI/custom_nodes/ComfyUI-Inspire-Pack
   0.1 seconds: /ComfyUI/custom_nodes/rgthree-comfy
   0.1 seconds: /ComfyUI/custom_nodes/ComfyUI-Impact-Pack
   0.3 seconds: /ComfyUI/custom_nodes/ComfyUI_Comfyroll_CustomNodes
   0.6 seconds: /ComfyUI/custom_nodes/ComfyUI_InstantID
   1.1 seconds: /ComfyUI/custom_nodes/was-node-suite-comfyui
   2.2 seconds: /ComfyUI/custom_nodes/ComfyUI-Crystools

Starting server

To see the GUI go to: http://127.0.0.1:8188

nvidia-smi output

$nvidia-smi 
Mon Jun 10 10:12:07 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8              18W / 170W |  10981MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1635      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A    266717      C   python3                                   10968MiB |
+---------------------------------------------------------------------------------------+

bidlake commented 2 weeks ago

I have tried the workflow and it works for me. I had the same error message on the third run. Even if it sounds a bit strange, I have had a similar Cuda error message a few times when working with the IP adapters. At some point, out of frustration, I clicked on Queue-Prompt again and the generation continued anyway. During the first run with the bottom IP adapter set to bypass, it needed 9 GB of Vram. With all IP adapters active in the workflow it was: 15.24 GB VRam (shared memory).

Operating system: Windows 10 Nvidia driver: Driver version: 552.22 python main.py: --windows-standalone-build --use-pytorch-cross-attention

The obvious differences here are that I'm using Windows and that I'm not running ComfyUI via Conda.

I haven't experimented that much with the InstantID stuff yet. But I have a little Youtube channel called: Show, don't tell. You can download the workflows I show in the videos from my website: alienate.de. There are some with IP adapters. The work that Matteo (developer) does with the IP-Adapter-Plus is just great, by the way.

Good luck getting it to work!

bidlake commented 2 weeks ago

And only IPAdapter Unified Loaders are used in your workflow. This means that you don't actually need the extra Clip Vision Loader node. This reduces my vram usage to 7.93 GB without the bottom IP adapter and 11.57 GB with all IP adapters. Every gigabyte counts! :)

abcnorio commented 2 weeks ago

@bidlake - thanks for the hint with unified loader, so I have misunderstood that - assumed it chooses either way, good you mention it. No need to run things redundantly. Will run that again tomorrow. So all in all this clarifies slowly :-).

How do you do shared memory with a dedicated GPU so you can digest 15GB? I thought shared memory is only with internal GPUs and comryUI does not support multiple-GPUs (I have another used 3050 with 8GB in the computer but cannot use it along with the 3060 to share VRAM between the GPUs).

post addon: just re-read about the nodes:

"clip_vision, this is optional if using any of the Unified loaders. "

-> good, seems I misunderstood the "optional" term that it means "redundant loading of clip vision" like it is the case for the rest. Assumed it won't hurt but as you write it seems it does. Will remove that asap and re-run.

@cubiq - thanks for you support, very appreciated, great work! Just did instantID with two ipadapters for this trial here, and "accidentially" the face transfer worked flawlessly, really impressive (different angle of the head, slightly different expression based on prompt, etc.).

bidlake commented 2 weeks ago

As far as I understand it, shared memory means that beyond the limit of 12 GB Vram, the slower RAM memory is also used. That's why last year, when I started experimenting with Stable Diffusion, I bought some more RAM in addition to the RTX 3060. But you shouldn't have any problems with that either, with your 64GB of RAM.

In the nvidia control panel under manage 3D settings, there is an option called Cuda system memory fallback policy. You can set this to: Prefer system memory fallback. At least in Windows 10, this means that CUDA can access the RAM in these situations.

And yes, this is a learning process. Above all, it obviously has a lot to do with trial and error. Really detailed user manuals for all these nodes are simply not to be found. I once spent a whole afternoon looking for them, but couldn't find anything that could answer my questions. But I can also understand that all the open source enthusiasts who build all this great stuff don't want to write detailed texts. It's all just very complex and basically quite new territory.

cubiq commented 2 weeks ago

-> good, seems I misunderstood the "optional" term that it means "redundant loading of clip vision" like it is the case for the rest. Assumed it won't hurt but as you write it seems it does. Will remove that asap and re-run.

it is "optional" as there are some scenarios where you might want to change the clipvision. It is redundant if you use the same clip vision model.

abcnorio commented 2 weeks ago

@cubiq - thanks for explanations.

@bidlake - tried all what you mentioned (better: all what is possible), removed clipvision completely, but without any better results. I think that's just reality, and I am not unhappy - the great support here from you and @cubiq allows me to use instantID + 3x ipadapters now, and for image generations maybe I need only 2x ipadapters. So till I find money to get a bigger GPU that's just fine, nothing to complain! The increase in memory usage while loading the various models is congruent - the increase behaves like a rectangle-over-time and shows a definite association to all single ipadapters-to-be-loaded, so it does not look like something serious goes wrong. Python and libs are updated, calls for ComfyUI changed back and forth, so nothing left to do.

Regarding GPU memory sharing - this is not really available in the nix world. Just read on the dev forum of nvidia where quite some nix people complained that this is available for win but not for nix. The driver version here doesn't have it at all, the newest win versions have some new feature to prevent AI/ML models to crash the system due to memory exhaustion called "memory fallback". Maybe I will try some minimal-win10-VM-via-proxmox/kvm-and-GPU-passthrough and see how it behaves under windows. Unfortunately that won't be a headless system but that's ok - win does not run headless. Using a VM may become relevant anyway - you surely read about the recent malware-spreading via comfyUI plugins on reddit and people discuss docker for comfyUI. However, from a nix point that's not secure enough and requires experts to really harden it against malicious code (apparmor profiles & CO) - not to mention that you still use a browser and if comfyUI is compromised, it is like visiting a malicious webpage. So you need a jailed browser, etc. All that is very unfortunate.

cubiq / ComfyUI_IPAdapter_plus

realistic VRAM usage | memory exhaustion with two ipadapters in a row #592