Stuck at BlurMask Node - Githubissues

gseth commented 6 months ago

This is great. However, this is working when I bypass the BlurMaskNode. There is no command prompt error; just gets stuck, and the ram (not vram) goes to 100 % after 2-3 minutes. I have to close cmd and relaunch.

SNAG-1958

I am on a 4090. Can you tell me what I am doing wrong here? Thank you!

Edit: This what happens, it works with blur mask value 3 upto 5. At 6, it starts giving errors and comfy crashes with a reconnecting message in floating menu. Nothing in the command prompt.

Acly commented 6 months ago

You're not the only one: https://github.com/Acly/krita-ai-diffusion/issues/401 But it only happens on some systems it seems, I can't reproduce it myself. If you can share your python & torch version it might help. Note that this usually runs on CPU like most image operations

gseth commented 6 months ago

You're not the only one: Acly/krita-ai-diffusion#401 But it only happens on some systems it seems, I can't reproduce it myself. If you can share your python & torch version it might help. Note that this usually runs on CPU like most image operations

Hi, thanks, I checked that thread.

Python 3.11.7 Torch version: 2.2.0+cu121

I am running a AMD 7950x CPU

I downgraded pytorch to 2.1.1 - It worked.

Doesn't work with 2.2.0.

gseth commented 6 months ago

You're not the only one: Acly/krita-ai-diffusion#401 But it only happens on some systems it seems, I can't reproduce it myself. If you can share your python & torch version it might help. Note that this usually runs on CPU like most image operations

So basically, I made some changes. I am not a coder, i have zero python coding knowledge, this was with the help of chatgpt.

It advised to move the function from cpu to gpu. I am not sure how optimized this is, but have a look.

nodes.py

class MaskedBlur:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {
                "image": ("IMAGE",),
                "mask": ("MASK",),
                "blur": ("INT", {"default": 255, "min": 3, "max": 8191, "step": 1}),
                "falloff": ("INT", {"default": 0, "min": 0, "max": 8191, "step": 1}),
            }
        }

    RETURN_TYPES = ("IMAGE",)
    CATEGORY = "inpaint"
    FUNCTION = "fill"

    def fill(self, image: Tensor, mask: Tensor, blur: int, falloff: int):
        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        image, mask = to_torch(image, mask, device=device)

        blur = make_odd(blur)
        falloff = min(make_odd(falloff), blur - 2)

        original = image.clone()
        alpha = mask.floor()
        if falloff > 0:
            erosion = binary_erosion(alpha, falloff)
            alpha = alpha * gaussian_blur(erosion, falloff)
        alpha = alpha.repeat(1, 3, 1, 1)

        image = gaussian_blur(image, blur)
        image = original + (image - original) * alpha
        return (to_comfy(image),)

util.py

def to_torch(image: Tensor, mask: Tensor | None = None, device=None):
    if len(image.shape) == 3:
        image = image.unsqueeze(0)
    image = image.permute(0, 3, 1, 2)  # BHWC to BCHW
    if mask is not None:
        if len(mask.shape) == 2:
            mask = mask.unsqueeze(0).unsqueeze(0)  # HWC to BCHW
        elif len(mask.shape) == 3:
            mask = mask.unsqueeze(1)  # BHW to BCHW
    if device:
        image = image.to(device)
        if mask is not None:
            mask = mask.to(device)
    return image, mask

After this I got error in the nodes.py folder located here: ComfyUI_windows_portable\ComfyUI

File "Q:\ComfyUI\ComfyUI_windows_portable\ComfyUI\nodes.py", line 415, in encode
    pixels[:,:,:,i] *= m
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

It then advised me to add the following:

m = m.to(pixels.device)

before

pixels[:,:,:,i] *= m

(@line 415)

This is under "class InpaintModelConditioning:"

After this it works perfectly.

SNAG-1959

Please advise if the coding is proper? If not can you please update it appropriately, so everything is run by GPU and not the CPU.

Acly commented 6 months ago

It runs on CPU intentionally. While moving it to GPU may be a workaround, it doesn't really explain why it fails on CPU.

All image pre-preprocessing typically runs on CPU in ComfyUI, there are some things that are actually cheaper to do there, and going back and forth between CPU/GPU has a noticeable cost - so you usually only do it for heavy work.

A blur is kind of borderline heavy work though, so it could boil down to similar performance (depending on your CPU and GPU speed).

gseth commented 6 months ago

It runs on CPU intentionally. While moving it to GPU may be a workaround, it doesn't really explain why it fails on CPU.

All image pre-preprocessing typically runs on CPU in ComfyUI, there are some things that are actually cheaper to do there, and going back and forth between CPU/GPU has a noticeable cost - so you usually only do it for heavy work.

A blur is kind of borderline heavy work though, so it could boil down to similar performance (depending on your CPU and GPU speed).

It works with downgraded torch 2.1.1

Does that help?

Acly commented 6 months ago

It does create the suspicion that it's a bug in torch 2.2.0. But because it seems to only happen for few people and I can't reproduce it myself at all even with 2.2.0, makes it difficult to investigate further or report it.

stepahin commented 4 months ago

Hi, I'm having the same problem. I seem to have a latest and updated comfyui. Is there any solution easier than replacing the lines of code above?

Acly commented 4 months ago

As a workaround set the environment variable ONEDNN_MAX_CPU_ISA=AVX2 when launching ComfyUI

gseth commented 4 months ago

As a workaround set the environment variable ONEDNN_MAX_CPU_ISA=AVX2 when launching ComfyUI

THANK YOU!!

yuyukongkong commented 4 months ago

As a workaround set the environment variable ONEDNN_MAX_CPU_ISA=AVX2 when launching ComfyUI在启动 ComfyUI 时设置环境变量 ONEDNN_MAX_CPU_ISA=AVX2 作为一种解决方法

THANK YOU!

Acly commented 4 months ago

Stable version of torch 2.3.0 has been released, this should no longer be an issue there.

Acly / comfyui-inpaint-nodes

Stuck at BlurMask Node #15