Error first try SD3 directml RX580

KillyTheNetTerminal commented 2 weeks ago

Error occurred when executing KSampler:

Expected all tensors to be on the same device, but found at least two devices, privateuseone:0 and cpu!

File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\nodes.py", line 1355, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\nodes.py", line 1325, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\sample.py", line 43, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 794, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 696, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 683, in sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 662, in inner_sample samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 567, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comf\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\k_diffusion\sampling.py", line 137, in sample_euler denoised = model(x, sigma_hat * s_in, *extra_args) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 291, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 649, in call return self.predict_noise(args, kwargs) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 652, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 277, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 226, in calc_cond_batch output = model.apply_model(inputx, timestep, c).chunk(batch_chunks) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\model_base.py", line 103, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comf\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comf\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(args, **kwargs) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\ldm\modules\diffusionmodules\mmdit.py", line 961, in forward return super().forward(x, timesteps, context=context, y=y) File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\ldm\modules\diffusionmodules\mmdit.py", line 937, in forward x = self.x_embedder(x) + self.cropped_pos_embed(hw, device=x.device).to(dtype=x.dtype) imagen_2024-06-12_085506430

timesqueezer commented 2 weeks ago

Can confirm the same error on a RTX 3050 / Intel Core i7-11800H notebook. The only difference is this line: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

KillyTheNetTerminal commented 2 weeks ago

exactly cause you have Nvidea and Cuda

KillyTheNetTerminal commented 2 weeks ago

working on CPU but slow as hell. i3-9100f

Cremesis commented 2 weeks ago

Same problem for me

Exception during processing!!! Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Traceback (most recent call last):
  File "C:\Users\<myuser>\Downloads\comfyui\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)

jtyszkiew commented 2 weeks ago

Seems to work for me on:

│Total VRAM 11980 MB, total RAM 64140 MB
│pytorch version: 2.3.0+cu121
│Set vram state to: NORMAL_VRAM
│Device: cuda:0 NVIDIA GeForce RTX 4070 : cudaMallocAsync
│VAE dtype: torch.bfloat16
│Using pytorch cross attention

Nvidia + CUDA

kuldp18 commented 2 weeks ago

Can confirm the same error on a RTX 3050 / Intel Core i7-11800H notebook. The only difference is this line: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I have the same error too..

15Litrov commented 2 weeks ago

A fresh manual install with nightly pytorch (other not tested) helped me overcome this problem. 1050ti 4gb + 32gb RAM

kuldp18 commented 2 weeks ago

A fresh manual install with nightly pytorch (other not tested) helped me overcome this problem. 1050ti 4gb + 32gb RAM

can we just update the pytorch in the current install? and how is 4gb vram handling sd3 btw?

15Litrov commented 2 weeks ago

A fresh manual install with nightly pytorch (other not tested) helped me overcome this problem. 1050ti 4gb + 32gb RAM

can we just update the pytorch in the current install? and how is 4gb vram handling sd3 btw?

Maybe? I did not test. About performance: 30 s/it for 1024x1024 with dualCLIP.

kuldp18 commented 2 weeks ago

Guys the issue is fixed, please do an update!

AlexBenjarmin commented 2 weeks ago

Guys the issue is fixed, please do an update!

update Comfy UI?

KillyTheNetTerminal commented 2 weeks ago

yes it works now, update comfyui (I use manager) very slow per it. There's is a way to speed up this?

KillyTheNetTerminal commented 2 weeks ago

ltdrdata commented 2 weeks ago

You should not use dpmpp_2m, karras. Just use euler, sgm_uniform.

karras is bad for SD3.

kuldp18 commented 2 weeks ago

You should not use dpmpp_2m, karras. Just use euler, sgm_uniform.

karras is bad for SD3.

the official recommendation is dpm though, isn't euler too random according to sd3 architecture?

ltdrdata commented 2 weeks ago

You should not use dpmpp_2m, karras. Just use euler, sgm_uniform. karras is bad for SD3.

the official recommendation is dpm though, isn't euler too random according to sd3 architecture?

https://comfyanonymous.github.io/ComfyUI_examples/sd3/

Official example is suggesting euler, sgm_uniform. In my test. dpmpp_2m sampler is ok. but the scheduler must be one of normal, simple, sgm_uniform, ddim_uniform.

KillyTheNetTerminal commented 2 weeks ago

the same, the image is still noisy

KillyTheNetTerminal commented 2 weeks ago

ltdrdata commented 2 weeks ago

Try on cpu mode.

Wallboy commented 2 weeks ago

Same issue with just getting noisy generated images. 7900 XTX also running using DirectML.

Perhaps SD3 is not working with AMD GPUs/DirectML yet.

kopaser6463 commented 2 weeks ago

Same issue, i nail it down a little bit to variable named out in sampling_function in samplers.py being different on cpu/directml. Here a crazy path to it. nodes.py -> samplers.py -> KSampler.sample -> sample(diferent one) -> CFGGuider.sample ->CFGGuider.inner_sample (sampler.sample(self, sigmas...)) -> sampler = sampler_object(self.sampler just a name) -> sampler_object -> ksampler -> KSAMPLER.sample(self, model_wrap, sigmas, extra_args...) -> model_k = KSamplerX0Inpaint(model_wrap, sigmas) -> model_wrap is self in sampler.sample so CFGGuider() call return self.predict_noise() -> sampling_function(model) -> cfg_function(model) -> out. It is different on cpu/directml. Why? I don't know.

Wallboy commented 2 weeks ago

If anyone wants to get a working SD3 with AMD GPUs in the mean time, look up the ComfyUI Zluda fork and use that instead. Working great.

Just be warned that the first generation takes a little while as a bunch of databases are being processed. Similar to if you've ever used A1111 and Zluda, you had that same wait time for your first generation after installing it.

KillyTheNetTerminal commented 2 weeks ago

I never try and set up zluda for comfyui, this speed ups generations? compared to directml? how can I test this?

comfyanonymous / ComfyUI

Error first try SD3 directml RX580 #3689