all input tensors must be on the same device. Received mps:0 and cpu

bvndls commented 8 months ago

How to reproduce:

Device - Base model M1 Air (7 Core GPU, 8GB RAM)
Install ComfyUI
Install CoreMLSuite Start ComfyUI with FP16
python main.py --force-fp16
Load the workflow

Error log:

➜  ComfyUI git:(master) python3 main.py --force-fp16
Total VRAM 8192 MB, total RAM 8192 MB
Forcing FP16.
/Users/bvndls/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Set vram state to: SHARED
Device: mps
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
scikit-learn version 1.3.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.2.0.dev20231027 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.

Import times for custom nodes:
   0.4 seconds: /Users/bvndls/ComfyUI/custom_nodes/ComfyUI-CoreMLSuite

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
INFO:coreml_suite.logger:Loading Unet.mlmodelc to CPU_AND_NE
INFO:python_coreml_stable_diffusion.coreml_model:Loading /Users/bvndls/ComfyUI/models/unet/Unet.mlmodelc
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 3.5 seconds.
Requested to load SD1ClipModel
Loading 1 new model
model_type EPS
adm 0
WARNING:coreml_suite.logger:No latent image provided, using empty tensor.
WARNING:coreml_suite.logger:Batch size is different from expected input size. Chunking and/or padding will be applied.
Requested to load CoreMLModelWrapper
Loading 1 new model
  0%|                                                                                                                                                                                                           | 0/20 [00:00<?, ?it/s]
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "/Users/bvndls/ComfyUI/execution.py", line 153, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/Users/bvndls/ComfyUI/execution.py", line 83, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/Users/bvndls/ComfyUI/execution.py", line 76, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/Users/bvndls/ComfyUI/custom_nodes/ComfyUI-CoreMLSuite/coreml_suite/nodes.py", line 58, in sample
    return super().sample(
  File "/Users/bvndls/ComfyUI/nodes.py", line 1237, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/Users/bvndls/ComfyUI/nodes.py", line 1207, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/Users/bvndls/ComfyUI/comfy/sample.py", line 100, in sample
    samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 728, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler(), sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 633, in sample
    samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 589, in sample
    samples = getattr(k_diffusion_sampling, "sample_{}".format(sampler_name))(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **extra_options)
  File "/Users/bvndls/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/bvndls/ComfyUI/comfy/k_diffusion/sampling.py", line 580, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/Users/bvndls/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/bvndls/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 287, in forward
    out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed)
  File "/Users/bvndls/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/bvndls/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/bvndls/ComfyUI/comfy/k_diffusion/external.py", line 129, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/Users/bvndls/ComfyUI/comfy/k_diffusion/external.py", line 155, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 275, in apply_model
    out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 253, in sampling_function
    cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, model_options)
  File "/Users/bvndls/ComfyUI/comfy/samplers.py", line 229, in calc_cond_uncond_batch
    output = model_function(input_x, timestep_, **c).chunk(batch_chunks)
  File "/Users/bvndls/ComfyUI/custom_nodes/ComfyUI-CoreMLSuite/coreml_suite/models.py", line 45, in apply_model
    chunked_x = chunk_batch(x, sample_shape)
  File "/Users/bvndls/ComfyUI/custom_nodes/ComfyUI-CoreMLSuite/coreml_suite/latents.py", line 14, in chunk_batch
    return [torch.cat((latent_image, padding), dim=0)]
RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu

Prompt executed in 5.47 seconds

Selecting different __compute_unit or running ComfyUI without --force-fp16 or with either --gpu-only or --disable-smart-memory__ didn't help

bvndls commented 8 months ago

Pip packages:

➜  ComfyUI git:(master) pip list | grep torch
torch                          2.2.0.dev20231027
torchaudio                     2.2.0.dev20231027
torchsde                       0.2.6
torchvision                    0.17.0.dev20231027
➜  ComfyUI git:(master)

UPD: same with torch 2.1.0

Python version:

➜  ComfyUI git:(master) python3 -V
Python 3.9.6

aszc-dev commented 8 months ago

Thank you for your torough report. I really appreciate your feedback. This issue was due to a minor mistake on my side, but it helped me identify a more serious bug, that I think, I managed to get rid of. I hope that the new version fixes your issue. If not, let me know, so that we can work this out.

rovo79 commented 8 months ago

@aszc-dev thank you for all your effort with this project.

Your latest update resolved this for me: RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu

sw_vers

ProductName:        macOS
ProductVersion:     14.0
BuildVersion:       23A344

coremltools 7.0
torch 2.1.0
python 3.11.5

M1 16GB

workflow

bvndls commented 8 months ago

Thank you for your torough report. I really appreciate your feedback. This issue was due to a minor mistake on my side, but it helped me identify a more serious bug, that I think, I managed to get rid of. I hope that the new version fixes your issue. If not, let me know, so that we can work this out.

Thank you so much for the effort, CoreML Suite effectively doubles the speed

tested with sd1.5-pruned-emaonly versus the same model, converted using coremltools

initial load took significantly more time because I was using an .mlpackage, using an .mlmodelc should speed things up even more

aszc-dev / ComfyUI-CoreMLSuite

all input tensors must be on the same device. Received mps:0 and cpu #3

How to reproduce:

Error log: