[Bug]: Expected all tensors to be on the same device

FearInNyu commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

When generating an image for the first time, i get an error which is an Out of Memory. So after that, i try to downgrade my image, but get another error message saying that.

Traceback (most recent call last): File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\call_queue.py", line 56, in f res = list(func(*args, kwargs)) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\call_queue.py", line 37, in f res = func(*args, *kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img processed = process_images(p) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 486, in process_images res = process_images_inner(p) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 625, in process_images_inner uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 570, in get_conds_with_caching cache[1] = function(shared.sd_model, required_prompts, steps) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\prompt_parser.py", line 140, in get_learned_conditioning conds = model.get_learned_conditioning(texts) File "D:\Stable Diffusion 2\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 669, in get_learned_conditioning c = self.cond_stage_model(c) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_clip.py", line 229, in forward z = self.process_tokens(tokens, multipliers) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_clip.py", line 254, in process_tokens z = self.encode_with_transformers(tokens) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_clip.py", line 302, in encode_with_transformers outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(*input, kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward return self.text_model( File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 708, in forward hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 223, in forward inputs_embeds = self.token_embedding(input_ids) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack.py", line 234, in forward inputs_embeds = self.wrapped(input_ids) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\sparse.py", line 160, in forward return F.embedding( File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

i already tried to find this issue since it has been there for quite a long time, but since i'm not really someone who know how coding works, i don't really know what to do.

my webui-user has indeed --medvram and it can create this problem, but i can't use --lowvram since i want to try having better images, and can't remove this --medvram since my config is not enough to run it without it.

Steps to reproduce the problem

go to your webui-user.bat and modify it
In COMMANDLINE_ARGS, place "--xformers --precision full --no-half --medvram --always-batch-cond-uncond --opt-split-attention --opt-sub-quad-attention" inside of it. 3.Run the .bat
Try to get an out of memory CUDA
Generate another image and you should get the error.

What should have happened?

Well, it should have just be able to generate the image, or told me if it was again an Out of Memory. Instead, i got the other error.

Commit where the problem happens

i don't know where the problem happens

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

COMMANDLINE_ARGS = --xformers --precision full --no-half --medvram --always-batch-cond-uncond --opt-split-attention --opt-sub-quad-attention

List of extensions

LDSR Lora ScuNET SwinIR prompt-bracket-checker

Console logs

Traceback (most recent call last):
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 636, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 836, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 351, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 227, in launch_sampling
    return func()
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 351, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 594, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 138, in forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": c_crossattn, "c_concat": [image_cond_in[a:b]]})
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1201, in _call_impl
    result = hook(self, input)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\lowvram.py", line 35, in send_me_to_gpu
    module.to(devices.device)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 113, in to
    return super().to(*args, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 989, in to    return self._apply(convert)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply
    param_applied = fn(param)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 2.00 GiB total capacity; 1.66 GiB already allocated; 0 bytes free; 1.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 625, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\processing.py", line 570, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward
    return self.text_model(
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 708, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 223, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\modules\sd_hijack.py", line 234, in forward
    inputs_embeds = self.wrapped(input_ids)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\sparse.py", line 160, in forward
    return F.embedding(
  File "D:\Stable Diffusion 2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

Additional information

It is possible, i think, that it loses the device i'm using after the second generation because of the first error, i'm not really sure about that

U-DI-Page commented 1 year ago

try this set CUDA_VISIBLE_DEVICES=0, not --device-id

lance2016 commented 1 year ago

CUDA_VISIBLE_DEVICES=0

where should I set CUDA_VISIBLE_DEVICES=0

zac66789 commented 1 year ago

I had this error immediately after downloading the openpose editor extension and fixing my controlnet config issues.

Once I ran the webui with the commandline arg "--lowvram", I was able to run controlnet with openpose with "low vram" checked just fine (albeit extremely slowly). After running both controlnet and the a1111 in lowvram mode. In my case this was a GPU memory issue.

AI-Robot-Morris commented 1 year ago

same issues here, i have two GPUs running into two instances, BUT happen once, shows the same errors messages.

NGC38 commented 1 year ago

I'm not sure but seems that happens when a Lora model made for SDXL is mixed with other non-SDXL models or when we use diffusers for SDXL with non-ones and vice-versa.

dustin2551 commented 1 year ago

I have this issue too. I tried 3 communities and nobody is providing any help to fix this issue really.

Trahloc commented 1 year ago

I have this same problem. In the vladmatic fork they solved it by adding the "--cuda" flag so I used their project for a while but came back to automatic1111 to see how things were going and this problem persisted. So, I chattered with GPT and the solution that worked for me was adding this to webui.py and also stable-diffusion-webui/modules/call_queue.py.

import torch
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')

My problem is I want to use my 2nd gpu for dedicated SDXL so I run:

CUDA_VISIBLE_DEVICES=1 ./webui.sh --no-half --no-half-vae

But that throws the error about detecting two tensor devices because my cpu has a gpu built in (I think that's the issue at least). Anyways this got the program loaded and working for me as the cuda visible devices environmental argument by itself wasn't solving the problem since it ignores the existence of the CPU/CPU w/GPU.

edit: I've seen a few random errors kicking back the same tensors error, so I just throw those two lines into that .py file and it seems to work. I'm sure there is a more universal place to put this information, but my ignorance is infinite.

sun11 commented 9 months ago

Try this,

change modules/initialize.py line 154:

from:

Thread(target=load_model).start()

to:

m_thread = Thread(target=load_model)
m_thread.start()
m_thread.join()

AUTOMATIC1111 / stable-diffusion-webui