comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
50.64k stars 5.32k forks source link

FLUX Issue | MPS framework doesn't support float64 #4165

Open alexgenovese opened 1 month ago

alexgenovese commented 1 month ago

Expected Behavior

Run the inference

Actual Behavior

After 273.31 seconds, it throws an exception

Steps to Reproduce

Upload the example workflow for DEV version https://comfyanonymous.github.io/ComfyUI_examples/flux/

Debug Logs

!!! Exception during processing!!! Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Traceback (most recent call last):
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy_extras/nodes_custom_sampler.py", line 612, in sample
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/k_diffusion/sampling.py", line 143, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-TiledDiffusion/.patches.py", line 4, in calc_cond_batch
    return calc_cond_batch_original_tiled_diffusion_91e66834(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/samplers.py", line 228, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py", line 64, in apply_model_uncond_cleanup_wrapper
    return orig_apply_model(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/model_base.py", line 121, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 135, in forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/model.py", line 112, in forward_orig
    pe = self.pe_embedder(ids)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/layers.py", line 21, in forward
    [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexgenovese/Desktop/2_comfy/comfy/ldm/flux/math.py", line 16, in rope
    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Other

No response

comfyanonymous commented 1 month ago

https://github.com/comfyanonymous/ComfyUI/commit/48eb1399c02bdae7e14b2208c448b69b382d0090

can you check if this fixes it.

mhale1 commented 1 month ago

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

tombearx commented 1 month ago

On Mac it seems to run with default settings, but just gets a black image output. If I change it fp8 as mentioned above then Mac says MPS doesn't support that.

How much RAM do you have? For some reason both original and fp8 models are taking around 40+gb. Is it the same for you?

mhale1 commented 1 month ago

@tombearx I have a 64 GB M1 Mac and a 16 GB 3080 on my Windows machine. Use the Mac more at work so was trying there first.

ghogan42 commented 1 month ago

It probably won't help fix it. But when I enable preview, I can see that as the image is generating, it is adding new stripes to the top of the image and the actual image may be shifting down by a corresponding amount.

flux_on_mac_m3_max

I also am running on an M3 Max with 128GB ram. Flux won't run at 8-bit at all, comfy gives an error. The T5 model runs at 8 or 16 but that doesn't help with this issue. I updated pytorch to the current daily build of 2.5.0 wich also did not help.

brkirch commented 1 month ago

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors): ComfyUI_00104_

twalderman commented 1 month ago

this workflow is working on my m3/128 https://civitai.com/models/617060/comfyui-workflow-for-flux-simple

QueryType commented 1 month ago

OK guys i pruned the weights, theyre now 11GB and no quality loss, it loads up faster, takes way less space in VRAM... Not sure why they were not relased pruned this way. They are loaded in 8bit still tho, i believe should be in 16, can fp16 be enabled in loader as well ? Cause when i tried to add fp16 on my own, i think it loaded as default and generation was very slow... compared to 8.

class UNETLoader: @classmethod def INPUT_TYPES(s): return {"required": { "unet_name": (folder_paths.get_filename_list("unet"), ), "weight_dtype": (["default", "fp16", "fp8_e4m3fn", "fp8_e5m2"],) }} RETURN_TYPES = ("MODEL",) FUNCTION = "load_unet"

CATEGORY = "advanced/loaders"

def load_unet(self, unet_name, weight_dtype):
    weight_dtype = {"default": None, 
                    "fp16": torch.float16,
                    "fp8_e4m3fn": torch.float8_e4m3fn, 
                    "fp8_e5m2": torch.float8_e4m3fn}[weight_dtype]
    unet_path = folder_paths.get_full_path("unet", unet_name)
    model = comfy.sd.load_unet(unet_path, dtype=weight_dtype)
    return (model,)

Can you explain how to prune it? Where to add? Sorry if it is noob question.

** Sorry, I git pulled and checked the code. All clear. Thanks!

QueryType commented 1 month ago

Well, first shot did not work. I am on torch 2.3.1, Mac M2, 24GB. I loaded the schnell, fp8_e4m3fn. As is seen it does not use MPS and triggered a 5GB swap. I think I will wait for fixes to flow in.

`Requested to load Flux Loading 1 new model python(4803) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 0%| | 0/4 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Prompt executed in 218.28 seconds`

ghogan42 commented 1 month ago

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation.

Yep. This is the way. Downgrading to these versions fixes generation for me on my m3 max based macbook.

mhale1 commented 1 month ago

Still no luck yet on my M1 Max even after the torch downgrades. I take that back. Just pulled latest from this morning (just the clip_l encoder change?), and that combined with the earlier torch downgrade did fix it.

twalderman commented 1 month ago

the latest MPS nightly is working for me.

QueryType commented 1 month ago

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

RefractAI commented 1 month ago

the latest MPS nightly is working for me.

Nightly is still broken for me. 2.3 downgrade works.

Adreitz commented 1 month ago

I tried latest nightly. It "works" when using the normal cfgguider node, but is extremely blurry. Using basic guider + flux guidance node leads to noise. ComfyUI_00002_ ComfyUI_00004_ ComfyUI_00005_

[Edit] Confirmed that downgraded torch does work, though you need basic guider + flux guidance node. Cfgguider node still produces blurry output. ComfyUI_00013_ ComfyUI_00014_ ComfyUI_00015_ ComfyUI_00016_ (Image pairs differ in scheduler between euler and bosh3 (custom ODE scheduler).)

twalderman commented 1 month ago

Has anyone seen value to the new guider for flux? If so I will downgrade to try it. With the nightly I'm getting nice output with guidance of 1.

tombearx commented 1 month ago

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

Adreitz commented 1 month ago

@twalderman I just tested and there might be something wrong with the guidance. I'm not seeing any difference between scale 1.0 and scale 4.5. Literally zero, when I subtract one image from the other. Nevermind, comfy messed up somehow. How exactly did you get things working with torch nightlies?

twalderman commented 1 month ago

I didnt do anything unusual. I tested with the nightly and had no issues so I didnt revert back again. I have been generating images all day.

Adreitz commented 1 month ago

@twalderman Weird. What OS version are you using?

Here is an example of the differences you could expect from changing the guidance scale (1.0 - 4.0 in steps of 0.5; 4.5 is above; all using bosh3 sampler).

ComfyUI_00021_ ComfyUI_00023_ ComfyUI_00025_ ComfyUI_00027_ ComfyUI_00029_ ComfyUI_00031_ ComfyUI_00033_

RainbowBull commented 1 month ago

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

tombearx commented 1 month ago

can you share your workflow? On my Max M1 it runs for 10 min and the pic is noisy.

I used workflow from previous picture. I have around 90-100s/it probably because bf16 is not supported directly and model using much more ram (and swap) than it should.

QueryType commented 1 month ago

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

twalderman commented 1 month ago

@Adreitz i am using the latest sequoia beta.

tombearx commented 1 month ago

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

Can't manage to run it even on a 32GB M1 Max. Has anyone succeed?

It is a bit of a bad situation for us. I am at 24G cannot even dream.

Looks like RAM issue arise due to the fact that text encoders hasn't unloaded from RAM on MPS. I opened the issue: https://github.com/comfyanonymous/ComfyUI/issues/4201

dreamrec commented 1 month ago

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors): ComfyUI_00104_

perfect solution !

RainbowBull commented 1 month ago

how long does it takes to generate 1 image? Mine takes 10 min

dreamrec commented 1 month ago

how long does it takes to generate 1 image? Mine takes 10 min

M3 max 64gb takes 210s (1024x1024 30 steps)

timothyallan commented 1 month ago

how long does it takes to generate 1 image? Mine takes 10 min

Just under 5 min on M2 Max 64gb at 1024 @ 20 steps

RainbowBull commented 1 month ago

i have m1 max 32gb but it still takes 10 min

@dreamrec can you share the workflow again? its private so i cant see it.

bauerwer commented 1 month ago

On M3 Max with 128GB RAM, had to downgrade torch (pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1) as well as using fp32 vae (start ComfyUI with --fp32vae). With newer torch, I get noisy images. With fp16 or bf16 vae, I get a black image after vae decode.

joshuachung commented 1 month ago

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

I am using M3 Max w/ 36GB RAM. Result is still negative.

`[START] Security scan [DONE] Security scan

ComfyUI-Manager: installing dependencies done.

ComfyUI startup time: 2024-08-06 11:48:38.407807 Platform: Darwin Python version: 3.11.7 (main, Dec 15 2023, 12:09:56) [Clang 14.0.6 ] Python executable: /opt/anaconda3/bin/python3 ComfyUI Path: /Users/joshua/ComfyUI Log path: /Users/joshua/comfyui.log

Prestartup times for custom nodes: 0.9 seconds: /Users/joshua/ComfyUI/custom_nodes/ComfyUI-Manager

Total VRAM 36864 MB, total RAM 36864 MB pytorch version: 2.3.1 Set vram state to: SHARED Device: mps Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention [Prompt Server] web root: /Users/joshua/ComfyUI/web

Loading: ComfyUI-Manager (V2.48.5)

ComfyUI Revision: 2473 [1abc9c87] | Released on '2024-08-05'

Import times for custom nodes: 0.0 seconds: /Users/joshua/ComfyUI/custom_nodes/websocket_image_save.py 0.1 seconds: /Users/joshua/ComfyUI/custom_nodes/ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188 [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json FETCH DATA from: /Users/joshua/ComfyUI/custom_nodes/ComfyUI-Manager/extension-node-map.json [DONE] got prompt model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16 model_type FLOW clip missing: ['textprojection.weight'] Requested to load FluxClipModel Loading 1 new model Requested to load Flux Loading 1 new model 0%| | 0/4 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Prompt executed in 60.57 seconds `

AaronWard commented 1 month ago

image

I've tried a number of different approaches on M3 Macbook pro, including using the _flux_rope hack with bfloat16, the model is only returning grainy results - i've tried bumping up the number of inference steps, which increases inference duration but no noticeable improvements.

import torch
from diffusers import FluxPipeline
import diffusers
from accelerate import Accelerator

# Modify the rope function to handle MPS device
_flux_rope = diffusers.models.transformers.transformer_flux.rope

def new_flux_rope(pos: torch.Tensor, dim: int, theta: int) -> torch.Tensor:
    assert dim % 2 == 0, "The dimension must be even."
    if pos.device.type == "mps":
        return _flux_rope(pos.to("cpu"), dim, theta).to(device=pos.device)
    else:
        return _flux_rope(pos, dim, theta)
diffusers.models.transformers.transformer_flux.rope = new_flux_rope

model_path = "./saved_flux_model"
print(f"Model path: {model_path}")

# Step 2: Load and cache the model locally
print("Loading model...")
pipe = FluxPipeline.from_pretrained(
     # "black-forest-labs/FLUX.1-schnell",
    "black-forest-labs/FLUX.1-dev",
     # cache_dir=model_path,
     # revision='refs/pr/1',
     # low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16,
 ).to("mps")
print("Model loaded.")

# accelerator = Accelerator()
# pipe = accelerator.prepare(pipe)

prompt = "Anime cat girl holding a sign that says hello world"
print(f"Prompt: {prompt}")

print("Generating image...")
try:
    image = pipe(
        prompt=prompt,
        guidance_scale=3.5,
        height=1024,
        width=1024,
        num_inference_steps=6,
        max_sequence_length=256,
    ).images[0]
except Exception as e:
    print(f"An error occurred: {e}")
finally:
    del pipe

print("Image generated.")

output_path = "_output/flux_image.png"
image.save(output_path)
print(f"Image saved to {output_path}.")
QueryType commented 1 month ago

Unless pytorch support Float8_e4m3fn dtypes for MPS backends, people with less than 32GB unified memory should forget to run these locally on Apple Silicon.

I am using M3 Max w/ 36GB RAM. Result is still negative.

`[START] Security scan [DONE] Security scan

ComfyUI-Manager: installing dependencies done.

ComfyUI startup time: 2024-08-06 11:48:38.407807 Platform: Darwin Python version: 3.11.7 (main, Dec 15 2023, 12:09:56) [Clang 14.0.6 ] Python executable: /opt/anaconda3/bin/python3 ComfyUI Path: /Users/joshua/ComfyUI Log path: /Users/joshua/comfyui.log

Prestartup times for custom nodes: 0.9 seconds: /Users/joshua/ComfyUI/custom_nodes/ComfyUI-Manager

Total VRAM 36864 MB, total RAM 36864 MB pytorch version: 2.3.1 Set vram state to: SHARED Device: mps Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention [Prompt Server] web root: /Users/joshua/ComfyUI/web

Loading: ComfyUI-Manager (V2.48.5)

ComfyUI Revision: 2473 [1abc9c8] | Released on '2024-08-05'

Import times for custom nodes: 0.0 seconds: /Users/joshua/ComfyUI/custom_nodes/websocket_image_save.py 0.1 seconds: /Users/joshua/ComfyUI/custom_nodes/ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188 [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json FETCH DATA from: /Users/joshua/ComfyUI/custom_nodes/ComfyUI-Manager/extension-node-map.json [DONE] got prompt model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16 model_type FLOW clip missing: ['textprojection.weight'] Requested to load FluxClipModel Loading 1 new model Requested to load Flux Loading 1 new model 0%| | 0/4 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) 0%| | 0/4 [00:00<?, ?it/s] !!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype. Traceback (most recent call last): File "/Users/joshua/ComfyUI/execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 612, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 716, in sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 695, in inner_sample samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 600, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/k_diffusion/sampling.py", line 143, in sample_euler denoised = model(x, sigma_hat * s_in, extra_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 299, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 682, in call* return self.predict_noise(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 685, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 279, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch output = model.apply_model(inputx, timestep, c).chunk(batch_chunks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/model_base.py", line 123, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/ldm/flux/model.py", line 141, in forward out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/ldm/flux/model.py", line 102, in forward_orig img = self.img_in(img) ^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/ops.py", line 63, in forward return self.forward_comfy_cast_weights(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/ops.py", line 58, in forward_comfy_cast_weights weight, bias = cast_bias_weight(self, input) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/ops.py", line 39, in cast_bias_weight bias = cast_to(s.bias, dtype, device, non_blocking=non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/joshua/ComfyUI/comfy/ops.py", line 24, in cast_to return weight.to(device=device, dtype=dtype, non_blocking=non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

Prompt executed in 60.57 seconds `

Below is the problem: !!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

bghira commented 1 month ago

Apple MPS requires int8 and not fp8 (e4m3fn) - though if possible you should use e5m2 instead, as Flux benefits from the increased range. maybe not once the activations are clamped to fp16 range...

bghira commented 1 month ago

gah! pytorch nightly regressed majorly in terms of MPS support. suddenly SDPA is no longer supported on MPS. can't use that either, anymore.

ajfisher commented 1 month ago

I have a MBP M2 Max 96GB and had to downgrade torch (pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1) to get rid of the noise issues. Had previously been running 2.4.1

Whilst I'm now getting actual images rather than noise, I've noticed that between 2.3 and 2.4 torch I think there's a speed difference. With "working" version one 1024x1024 @ 20 steps image takes about 340s to generate but with the "not working" version with the same scenario it was more like 220s on average. So there might be multiple things going on with this version change.

For now 5mins to generate an image feels a bit untenable time wise (is like using original SD on an Intel mac!) so I might have to go back to my SDXL workflows for a bit instead. That said, the images generated without need for refiners etc are pretty good.

bghira commented 1 month ago

MacOS 15 beta 3 with tonight's torch nightly. i think someone here was mistaken or confused, it still isn't working at all

image

bauerwer commented 1 month ago

body{font-family:Calibri,Arial;font-size:12px}Side note: flux and any torch >= 2.4 does not work for me. latest 2.5 dev does not work either.What works: torch 2.3.1 and force vae -p32 Obviously the 2.3 torch breaks a few newer things that I now can’t use (need 2.4 or even latest dev).

vishnoub commented 1 month ago

the mac application "drawthing" has found a solution, I don't know how they did it but flux works on my 16 gig MBP

azrahello commented 1 month ago

the mac application "drawthing" has found a solution, I don't know how they did it but flux works on my 16 gig MBP they use mlx frameworks for run, it is the only app optimized for silicon

acalococci commented 1 month ago

How do you downgrade torch? I'm a complete newb. I know I'm supposed to type pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 but whenever I type that into terminal I get the following:

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.

    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:

    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz

    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with

    brew install pipx

    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.

    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.

    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
bauerwer commented 1 month ago

It depends on where your python environment is. with ComfyUI (or even Stability Matrix), there is usually a venv folder directly inside the ComfyUI folder. You need to activate that python environment and then call pip in that environment.

For linux/mac:

How do you downgrade torch? I'm a complete newb. I know I'm supposed to type pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 but whenever I type that into terminal I get the following: error: externally-managed-environment

× This environment is externally managed ╰─> To install Python packages system-wide, try brew install xyz, where xyz is the package you are trying to install.

If you wish to install a Python library that isn't in Homebrew,
use a virtual environment:

python3 -m venv path/to/venv
source path/to/venv/bin/activate
python3 -m pip install xyz

If you wish to install a Python application that isn't in Homebrew,
it may be easiest to use 'pipx install xyz', which will manage a
virtual environment for you. You can install pipx with

brew install pipx

You may restore the old behavior of pip by passing
the '--break-system-packages' flag to pip, or by adding
'break-system-packages = true' to your pip.conf file. The latter
will permanently disable this error.

If you disable this error, we STRONGLY recommend that you additionally
pass the '--user' flag to pip, or set 'user = true' in your pip.conf
file. Failure to do this can result in a broken Homebrew installation.

Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages. hint: See PEP 668 for the detailed specification.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

ltdrdata commented 1 month ago

How do you downgrade torch? I'm a complete newb. I know I'm supposed to type pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 but whenever I type that into terminal I get the following:

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.

    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:

    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz

    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with

    brew install pipx

    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.

    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.

    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

You are trying to install in the system Python. This is especially prohibited in Linux environments to prevent OS malfunctions.

You should set up an isolated Python environment like venv and only install packages and run ComfyUI within that environment.

joneavila commented 1 month ago

I've tried downgrading PyTorch and a clean install, but my outputs are still noisy. The example FLUX dev workflow takes well over 5 minutes to complete, compared to ~2.5 minutes using Draw Things.

conda create --name comfyui python=3.11
conda activate comfyui

conda install pytorch::pytorch torchvision torchaudio -c pytorch

git clone https://github.com/comfyanonymous/ComfyUI.git comfyui
cd comfyui

pip install -r requirements.txt

python main.py --preview-method auto
ajfisher commented 1 month ago

@joneavila in your torch install line with conda that's going to pull the most recent version of pytorch etc. You need to specify the versions in that line to be explicit about which version to use eg pytorch==2.3.1

If you do a pip freeze on that environment you'll probably see you have version 2.4 or higher.

mPromp2sHub commented 1 month ago

If you're trying to run this model on a Apple Silicon Mac and having issues with broken image outputs, try downgrading torch with pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 as it seems that the latest stable version of torch has some bugs that break image generation. Here is what I get with the unmodified example workflow on a 64GB M1 Max with torch 2.3.1, using the latest ComfyUI commit as of this post and the Flux Dev model (with the fp16 T5 text encoder, t5xxl_fp16.safetensors): ComfyUI_00104_

This post saved my @ss. Thank you.

bghira commented 1 month ago

@rhvaara has made strides in fixing the issue and found it in pytorch.

zwqjoy commented 1 month ago

downgrade torch as temp fix: pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1

Not Worked for me, MacOS Sonoma14.6.1 on Macbook Pro M1 Max. 64g

still noise image.

Adreitz commented 1 month ago

@rhvaara has made strides in fixing the issue and found it in pytorch.

Do you have a link to a bug or PR? I couldn't find anything.