[BUG] `Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`

DrJKL commented 1 month ago

Before you start Verify the following:

[x] 1. That your ComfyUI is up to date
[x] 2. That the extension is up to date
[x] 3. That the issue isn't in the "Known Issues" section in the README

Describe the bug

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

CPU / Cuda error when running ComfyUI with --gpu-only flag, using the AIO node and the BlenderNeko Cutoff integration

Hard to reproduce since it seems to happen when running close to the VRAM limit.

I recognize this might need to be addressed in the other repo but figured I'd raise it here first. I hit this intermittently, I can update this issue if I find more details that could help.

Error stack (Path pruned)

``` Error occurred when executing PromptControlSimple: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! File "X:\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "X:\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "X:\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "X:\ComfyUI\custom_nodes\comfyui-prompt-control\prompt_control\node_aio.py", line 32, in apply pos_cond = pos_filtered = control_to_clip_common(clip, pos_sched, lora_cache, cond_cache) File "X:\ComfyUI\custom_nodes\comfyui-prompt-control\prompt_control\node_clip.py", line 674, in control_to_clip_common cond = encode(c) File "X:\ComfyUI\custom_nodes\comfyui-prompt-control\prompt_control\node_clip.py", line 650, in encode cond_cache[cachekey] = do_encode(clip, prompt, schedules.defaults, schedules.masks) File "X:\ComfyUI\custom_nodes\comfyui-prompt-control\prompt_control\node_clip.py", line 570, in do_encode cond, pooled = encode_prompt(clip, prompt, style, normalization) File "X:\ComfyUI\custom_nodes\comfyui-prompt-control\prompt_control\node_clip.py", line 337, in encode_prompt return encode_regions(clip, tokens, regions, style, normalization) File "X:\ComfyUI\custom_nodes\comfyui-prompt-control\prompt_control\node_clip.py", line 246, in encode_regions (r,) = finalize_clip_regions( File "X:\ComfyUI\custom_nodes\ComfyUI_Cutoff\cutoff.py", line 225, in finalize_clip_regions base_embedding_full, pool = encode_from_tokens(clip, base_weighted_tokens, token_normalization, weight_interpretation, True) File "X:\ComfyUI\custom_nodes\ComfyUI_Cutoff\cutoff.py", line 192, in encode_from_tokens embs_l, _ = advanced_encode_from_tokens(tokenized['l'], File "X:\ComfyUI\custom_nodes\ComfyUI_Cutoff\adv_encode.py", line 203, in advanced_encode_from_tokens embs, pooled = from_masked(unweighted_tokens, weights, word_ids, base_emb, length, encode_func) File "X:\ComfyUI\custom_nodes\ComfyUI_Cutoff\adv_encode.py", line 107, in from_masked pooled = (pooled - pooled_start) * (ws - 1) ```

To Reproduce I need a node that just fills the VRAM... Tried a workflow that uses 4 different checkpoints, but it fails at `Error occurred when executing CheckpointLoaderSimple:

Allocation on device 0 would exceed allowed memory. (out of memory)` instead.

Expected behavior Prompt control conditioning with cutoff applied, no errors.

asagi4 commented 1 month ago

Does this happen only when you use the cutoff integration? I don't think my nodes do much shuffling of CLIP tensors between the GPU and CPU (unless it happens accidentally somewhere), so in this case it might be that you're getting a CPU tensor from eg. the cutoff node when everything else is on GPU, or vice versa.

Looking at the code, I think the problem might be the line just before where the error occurs:

ws = torch.tensor(ws).reshape(-1,1).expand(pooled_start.shape)

That doesn't seem to specify the device to use for the tensor. Can you test what happens if you add a ws = ws.to(pooled) or something equivalent after that line?

DrJKL commented 1 month ago

I'll try it next time I can get it to trigger :-)

asagi4 / comfyui-prompt-control

[BUG] `Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!` #52