Error in sampling stage- possibly related to MPS backend

jwooldridge234 commented 5 months ago

Getting this error:

File "/Users/jackwooldridge/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/Users/jackwooldridge/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/Users/jackwooldridge/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/Users/jackwooldridge/ComfyUI/nodes.py", line 1344, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/Users/jackwooldridge/ComfyUI/nodes.py", line 1314, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/Users/jackwooldridge/ComfyUI/comfy/sample.py", line 37, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 755, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 657, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 644, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 623, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 534, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/comfy/k_diffusion/sampling.py", line 137, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 272, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 610, in __call__
    return self.predict_noise(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 613, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 258, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/Users/jackwooldridge/ComfyUI/comfy/samplers.py", line 216, in calc_cond_batch
    output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/ella.py", line 67, in __call__
    self.model_sampling.timestep(timestep_[i]),
IndexError: index 1 is out of bounds for dimension 0 with size 1

Seems to be related to this function:

def __call__(self, apply_model, kwargs: dict):
        input_x = kwargs["input"]
        timestep_ = kwargs["timestep"]
        c = kwargs["c"]
        cond_or_uncond = kwargs["cond_or_uncond"]  # [0|1]

        time_aware_encoder_hidden_states = []
        self.ella.to(device=self.load_device)
        for i in cond_or_uncond:
            h = self.ella(
                self.model_sampling.timestep(timestep_[i]),
                **self.embeds[i],
            )
            time_aware_encoder_hidden_states.append(h)
        self.ella.to(self.offload_device)

        c["c_crossattn"] = torch.cat(time_aware_encoder_hidden_states, dim=0)

        return apply_model(input_x, timestep_, **c)

This happens while running the latest ComfyUI on MacOS with "python main.py --preview-method taesd" as the initialization command. Also fails when running in force-fp16 mode. Fails if the encoder is set to fp32 or fp16. Let me know if I can provide any other information. Thanks!

JettHu commented 5 months ago

Does the error also occur without --preview-method-taesd. And would you mind sharing your workflow?

jwooldridge234 commented 5 months ago

Yes, just checked. And I'm using the default text to image workflow in the repo, just loaded Dreamshaper as the checkpoint rather than the aw_painting checkpoint.

JettHu commented 5 months ago

I'll try to reproduce it on my mac.

jwooldridge234 commented 5 months ago

@JettHu Wait, I just remembered (forgot because I've had to do this so many times in different ComfyUI plugins)- I modified the plugin code in model.py (line 28):

def forward(self, x: torch.Tensor, timestep_embedding: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
        emb = self.linear(self.silu(timestep_embedding))
        shift, scale = emb.view(len(x), 1, -1).chunk(2, dim=-1)
        return self.norm(x.to(torch.float32)).to(torch.float16) * (1 + scale) + shift

If I don't cast x to full and then back to half, I get a fatal error in the Apply_Ella stage:

Traceback (most recent call last):
  File "/Users/jackwooldridge/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/Users/jackwooldridge/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/Users/jackwooldridge/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/ella.py", line 119, in apply
    _cond, _uncond = ella_proxy.prepare_conds()
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/ella.py", line 52, in prepare_conds
    cond = self.ella(torch.Tensor([999]).to(torch.int64), **self.embeds[0])
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/model.py", line 296, in forward
    return self.connector(t5_embeds, timestep_embedding=time_embedding)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/model.py", line 106, in forward
    latents = p_block(x, latents, timestep_embedding=timestep_embedding)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/model.py", line 65, in forward
    normed_latents = self.ln_1(latents, timestep_embedding)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/custom_nodes/ComfyUI-ELLA/model.py", line 28, in forward
    return self.norm(x) * (1 + scale) + shift
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 201, in forward
    return F.layer_norm(
  File "/Users/jackwooldridge/ComfyUI/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 2546, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Sorry I didn't mention that before, since it might be relevant. Other plugins have worked fine with that change.

jwooldridge234 commented 5 months ago

Also, here's my full workflow in case it helps:

{
  "last_node_id": 21,
  "last_link_id": 26,
  "nodes": [
    {
      "id": 10,
      "type": "ELLALoader",
      "pos": [
        -3,
        626
      ],
      "size": {
        "0": 341.86419677734375,
        "1": 58
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "ELLA",
          "type": "ELLA",
          "links": [
            10
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ELLALoader"
      },
      "widgets_values": [
        "ella-sd1.5-tsc-t5xl.safetensors"
      ]
    },
    {
      "id": 17,
      "type": "T5TextEncode #ELLA",
      "pos": [
        117,
        742
      ],
      "size": {
        "0": 210,
        "1": 90.64571380615234
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "text_encoder",
          "type": "T5_TEXT_ENCODER",
          "link": 21
        }
      ],
      "outputs": [
        {
          "name": "ELLA_EMBEDS",
          "type": "ELLA_EMBEDS",
          "links": [
            23
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "T5TextEncode #ELLA"
      },
      "widgets_values": [
        "a large, textured green crocodile lying comfortably on a patch of grass with a cute, knitted orange sweater enveloping its scaly body. Around its neck, the sweater features a whimsical pattern of blue and yellow stripes. In the background, a smooth, grey rock partially obscures the view of a small pond with lily pads floating on the surface."
      ]
    },
    {
      "id": 12,
      "type": "EllaApply",
      "pos": [
        427,
        478
      ],
      "size": {
        "0": 210,
        "1": 86
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 11
        },
        {
          "name": "ella",
          "type": "ELLA",
          "link": 10
        },
        {
          "name": "positive",
          "type": "ELLA_EMBEDS",
          "link": 23
        },
        {
          "name": "negative",
          "type": "ELLA_EMBEDS",
          "link": 24
        }
      ],
      "outputs": [
        {
          "name": "model",
          "type": "MODEL",
          "links": [
            20
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            25
          ],
          "shape": 3,
          "slot_index": 1
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            26
          ],
          "shape": 3,
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "EllaApply"
      }
    },
    {
      "id": 5,
      "type": "EmptyLatentImage",
      "pos": [
        420,
        643
      ],
      "size": {
        "0": 210,
        "1": 106
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            2
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        512,
        512,
        1
      ]
    },
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        986,
        484
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            19
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 20,
      "type": "PreviewImage",
      "pos": [
        989,
        573
      ],
      "size": {
        "0": 210,
        "1": 246
      },
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 19
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 21,
      "type": "T5TextEncode #ELLA",
      "pos": [
        118,
        878
      ],
      "size": {
        "0": 210,
        "1": 90.64571380615234
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "text_encoder",
          "type": "T5_TEXT_ENCODER",
          "link": 22
        }
      ],
      "outputs": [
        {
          "name": "ELLA_EMBEDS",
          "type": "ELLA_EMBEDS",
          "links": [
            24
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "T5TextEncode #ELLA"
      },
      "widgets_values": [
        "nsfw, text, low quaility "
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        26,
        474
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            11
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            8
          ],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "dreamshaper_331BakedVae.safetensors"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        657,
        477
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 20
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 25
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 26
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        16785396861587,
        "randomize",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 18,
      "type": "T5TextEncoderLoader #ELLA",
      "pos": [
        -246,
        785
      ],
      "size": {
        "0": 339.4064025878906,
        "1": 106
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "T5_TEXT_ENCODER",
          "type": "T5_TEXT_ENCODER",
          "links": [
            21,
            22
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "T5TextEncoderLoader #ELLA"
      },
      "widgets_values": [
        "models--google--flan-t5-xl--text_encoder",
        0,
        "auto"
      ]
    }
  ],
  "links": [
    [
      2,
      5,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      10,
      10,
      0,
      12,
      1,
      "ELLA"
    ],
    [
      11,
      4,
      0,
      12,
      0,
      "MODEL"
    ],
    [
      19,
      8,
      0,
      20,
      0,
      "IMAGE"
    ],
    [
      20,
      12,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      21,
      18,
      0,
      17,
      0,
      "T5_TEXT_ENCODER"
    ],
    [
      22,
      18,
      0,
      21,
      0,
      "T5_TEXT_ENCODER"
    ],
    [
      23,
      17,
      0,
      12,
      2,
      "ELLA_EMBEDS"
    ],
    [
      24,
      21,
      0,
      12,
      3,
      "ELLA_EMBEDS"
    ],
    [
      25,
      12,
      1,
      3,
      1,
      "CONDITIONING"
    ],
    [
      26,
      12,
      2,
      3,
      2,
      "CONDITIONING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {},
  "version": 0.4
}

MythicalChu commented 5 months ago

I am having the same problem on DirectML Also, the RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' happens on DirectML too.

JettHu commented 5 months ago

I am having the same problem on DirectML Also, the RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' happens on DirectML too.

This may be solved by --fp32-text-enc

JettHu commented 5 months ago

@jwooldridge234 Thank you for sharing the environment and workflow. I didn’t reproduce it yesterday. I only have time to look at it during the working day. Besides have you tried --fp32-text-enc?

JettHu commented 5 months ago

@JettHu Wait, I just remembered (forgot because I've had to do this so many times in different ComfyUI plugins)- I modified the plugin code in model.py (line 28):
def forward(self, x: torch.Tensor, timestep_embedding: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
        emb = self.linear(self.silu(timestep_embedding))
        shift, scale = emb.view(len(x), 1, -1).chunk(2, dim=-1)
        return self.norm(x.to(torch.float32)).to(torch.float16) * (1 + scale) + shift

@jwooldridge234 On my mac (Apple M1 Pro, 32GB RAM), the error "LayerNormKernelImpl" not implemented for 'Half'also appears in the default mode. However, adding the parameter--fp32-text-enc` can run correctly.

Additionally I found that --fp32-text-enc mode performs better than changing self.norm precision to fp32 in model.py code.

The picture below shows:

1.15it/s with --fp32-text-enc bellow

5.31s/it(0.188it/s) without --fp32-text-enc and changing self.norm precision to fp32 in model.py code.

Unfortunately, the issue you asked at the beginning was not reproduced.

jwooldridge234 commented 5 months ago

@JettHu That solves the issue with the text encoder, and makes it a ton faster, thank you. I'm still getting that error related to the ksampler ("index 1 is out of bounds for dimension 0 with size 1"), unfortunately. It happens with all models and samplers I've tested.

jwooldridge234 commented 5 months ago

@MythicalChu Are you still seeing this issue when you add --fp32-text-enc?

JettHu commented 5 months ago

@JettHu That solves the issue with the text encoder, and makes it a ton faster, thank you. I'm still getting that error related to the ksampler ("index 1 is out of bounds for dimension 0 with size 1"), unfortunately. It happens with all models and samplers I've tested.

I think I found the problem you may encounter. Do you have a long prompt and is the condition input to KSampler coming from theCLIP Text encode node?

jwooldridge234 commented 5 months ago

Sadly, no- I'm using the T5 Text Encode into the Apply Ella node, and then taking the conditioning from there into the KSampler. Shortening the prompt (or deleting it altogether) makes no difference.

JettHu commented 5 months ago

@jwooldridge234 I have released a new version, you can try it and see if you still have this error.

jwooldridge234 commented 5 months ago

Perfect, solved my issue! Thanks!

MythicalChu commented 5 months ago

@MythicalChu Are you still seeing this issue when you add --fp32-text-enc?

Sorry for the late answer. No problem anymore, using --fp32-text-enc. Haven't tried the latest updates though, but I bet it's great :)

ranfengqaq commented 5 months ago

微信图片_20240429103738 微信图片_20240429103742 我想请问下运行T5TextEncoderLoader显示报错：Error occurred when executing T5TextEncoderLoader #ELLA:

'added_tokens' 通过访问gpt这个问题得到是没有从分词器文件句柄访问密钥，但似乎缺少该密钥，从而导致错误，想问一下这个错误怎么解决呢

TencentQQGYLab / ComfyUI-ELLA

Error in sampling stage- possibly related to MPS backend #12