NVIDIA / Stable-Diffusion-WebUI-TensorRT

TensorRT Extension for Stable Diffusion Web UI
MIT License
1.91k stars 145 forks source link

SDXL: RuntimeError: Expected all tensors to be on the same device #97

Closed boehmi1988 closed 10 months ago

boehmi1988 commented 1 year ago

Hi,

i successfully installed and configured this extension according to the installation instructions

"Generate Default Engines" went well and created the unet.

But when selecting it with base SDXL model an error occurs when generating the image:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Generating without the TensorRT unet still works fine.

System: Windows 10, RTX 4090, Nvidia Driver Version 545.84 WebUI version v1.6.0 (Running in Docker Container)  •  python: 3.10.9  •  torch: 2.0.1+cu118  •  xformers: 0.0.21.dev544  •  gradio: 3.41.2  •  checkpoint: e6bb9ea85b

Any idea what could cause this?

Complete Log: 2023-10-20 13:51:39 Activating unet: [TRT] base_sd_sd_xl_base_1.0_VAEFix 2023-10-20 13:51:39 Loading TensorRT engine: /stable-diffusion-webui/models/Unet-trt/base_sd_sd_xl_base_1.0_VAEFix_be9edd61_cc89_sample=1x4x96x96+2x4x128x128+8x4x128x128-timesteps=1+2+8-encoder_hidden_states=1x77x2048+2x77x2048+8x154x2048-y=1x2816+2x2816+8x2816.trt 2023-10-20 13:51:39 [I] Loading bytes from /stable-diffusion-webui/models/Unet-trt/base_sd_sd_xl_base_1.0_VAEFix_be9edd61_cc89_sample=1x4x96x96+2x4x128x128+8x4x128x128-timesteps=1+2+8-encoder_hidden_states=1x77x2048+2x77x2048+8x154x2048-y=1x2816+2x2816+8x2816.trt 2023-10-20 13:51:51 Profile 0: 2023-10-20 13:51:51 sample = [(1, 4, 96, 96), (2, 4, 128, 128), (8, 4, 128, 128)] 2023-10-20 13:51:51 timesteps = [(1,), (2,), (8,)] 2023-10-20 13:51:51 encoder_hidden_states = [(1, 77, 2048), (2, 77, 2048), (8, 154, 2048)] 2023-10-20 13:51:51 y = [(1, 2816), (2, 2816), (8, 2816)] 2023-10-20 13:51:51 latent = [(115), (115), (0)] 2023-10-20 13:51:51 0% 0/30 [00:00<?, ?it/s] 2023-10-20 13:51:52 *** Error completing request 2023-10-20 13:51:52 *** Arguments: ('task(5w0agwfsqv0h9gs)', 'a cat in a park', '', ['SDXL: Photographic'], 30, 'DPM++ 2M Karras', 1, 1, 7, 1024, 1024, False, 0.7, 2, '4x-UltraSharp', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x7fada8bfcf70>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'Euler a', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'inpaint_global_harmonious', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'Euler a', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'inpaint_global_harmonious', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 3072, 192, True, True, True, False, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fada8b870a0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fada25f4d90>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x7fada9341180>, False, 1, 0.15, False, 'OUT', ['OUT'], 5, 0, 'Bilinear', False, 'Bilinear', False, 'Lerp', '', '', False, False, None, True, 'from modules.processing import process_images\n\np.width = 768\np.height = 768\np.batch_size = 2\np.steps = 10\n\nreturn process_images(p)', 2, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, True, True, False, '#000000', False, 'Not set', True, True, '', '', '', '', '', 1.3, 'Not set', 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', False, 'None', None, None, False, None, None, False, None, None, False, 50, False, 4.0, '', 10.0, 'Linear', 3, False, 30.0, True, False, False, 0, 0.0, 'Lanczos', 1, True, 0, 0, 0.001, 75, 0.0, False, True, 'Illustration', 'svg', True, True, False, 0.5, False, 16, True, 16) {} 2023-10-20 13:51:52 Traceback (most recent call last): 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/call_queue.py", line 57, in f 2023-10-20 13:51:52 res = list(func(*args, **kwargs)) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/call_queue.py", line 36, in f 2023-10-20 13:51:52 res = func(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/txt2img.py", line 55, in txt2img 2023-10-20 13:51:52 processed = processing.process_images(p) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/processing.py", line 732, in process_images 2023-10-20 13:51:52 res = process_images_inner(p) 2023-10-20 13:51:52 File "/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack 2023-10-20 13:51:52 return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/processing.py", line 867, in process_images_inner 2023-10-20 13:51:52 samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/processing.py", line 1140, in sample 2023-10-20 13:51:52 samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 235, in sample 2023-10-20 13:51:52 samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_samplers_common.py", line 261, in launch_sampling 2023-10-20 13:51:52 return func() 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 235, in <lambda> 2023-10-20 13:51:52 samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2023-10-20 13:51:52 return func(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m 2023-10-20 13:51:52 denoised = model(x, sigmas[i] * s_in, **extra_args) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2023-10-20 13:51:52 return forward_call(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py", line 169, in forward 2023-10-20 13:51:52 x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in)) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2023-10-20 13:51:52 return forward_call(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward 2023-10-20 13:51:52 eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps 2023-10-20 13:51:52 return self.inner_model.apply_model(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_models_xl.py", line 37, in apply_model 2023-10-20 13:51:52 return self.model(x, t, cond) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2023-10-20 13:51:52 return forward_call(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda> 2023-10-20 13:51:52 setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs)) 2023-10-20 13:51:52 File "/stable-diffusion-webui/modules/sd_hijack_utils.py", line 28, in __call__ 2023-10-20 13:51:52 return self.__orig_func(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/repositories/generative-models/sgm/modules/diffusionmodules/wrappers.py", line 28, in forward 2023-10-20 13:51:52 return self.diffusion_model( 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2023-10-20 13:51:52 return forward_call(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/repositories/generative-models/sgm/modules/diffusionmodules/openaimodel.py", line 984, in forward 2023-10-20 13:51:52 emb = self.time_embed(t_emb) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2023-10-20 13:51:52 return forward_call(*args, **kwargs) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward 2023-10-20 13:51:52 input = module(input) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2023-10-20 13:51:52 return forward_call(*args, **kwargs) 2023-10-20 13:51:52 File "/stable-diffusion-webui/extensions-builtin/Lora/networks.py", line 429, in network_Linear_forward 2023-10-20 13:51:52 return originals.Linear_forward(self, input) 2023-10-20 13:51:52 File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward 2023-10-20 13:51:52 return F.linear(input, self.weight, self.bias) 2023-10-20 13:51:52 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

MorkTheOrk commented 1 year ago

Hi! You do you have --medvram or --lowvram as cli args for automatic1111? Can you try running it without these and check if it still crashes?

JM1216 commented 1 year ago

I have the same error so instead of making a separate bug report I'll just say for my specific case it only happens when attempting to use any controlnet have v1.1.411. Its apparently not a supported yet

arch1v1st commented 1 year ago

@boehmi1988 - I was initially battling with that same error of "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!" too. Just to be sure, are you using the latest DEV branch of Automatic1111's SD? That seems to be required ATM.

mykeehu commented 1 year ago

Hi! You do you have --medvram or --lowvram as cli args for automatic1111? Can you try running it without these and check if it still crashes?

I'm running without them now, because with the --medvram-sdxl option TensorRT didn't generate the trt file. So the error occurs without the switches.

evisanzay commented 1 year ago

I have the same error installed dev repo no --medvram nor --lowram tensor SD1.5 goes flawless, XL not working

andzejsp commented 1 year ago

so im not alone... sdxl checkpoint not working, no cli arguments. window 11, 3090

 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

version: v1.6.0  •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: 31e35c80fc

mashb1t commented 1 year ago

works with dev branch of A1111, see https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/issues/97#issuecomment-1773571713, https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/issues/18#issuecomment-1767680926 and as of commit https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/commit/37c15c1684dfb7c931d4b7114e31e2f6149256fd in the README of this project.

Supports Stable Diffusion 1.5 and 2.1. Native SDXL support coming in a future release. Please use the dev branch if you would like to use it today. Note that the Dev branch is not intended for production work and may break other things that you are currently using.

andzejsp commented 1 year ago

works with dev branch of A1111, see #97 (comment), #18 (comment) and as of commit 37c15c1 in the README of this project.

Supports Stable Diffusion 1.5 and 2.1. Native SDXL support coming in a future release. Please use the dev branch if you would like to use it today. Note that the Dev branch is not intended for production work and may break other things that you are currently using.

i havent tested this on dev branch but as per this comment https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/issues/97#issuecomment-1775451223 seems like its not working there either with sdxl checkpoint?

chazzhou commented 1 year ago

I encountered the same error about tensors not being on the same device using the main branch for Automatic1111. It worked for me after I switched to the dev branch for Automatic1111 with SDXL.

My environment: Automatic1111 version: v1.6.0-263-g464fbcd9  •  python: 3.10.11  •  torch: 2.1.0+cu121  •  xformers: N/A  •  gradio: 3.41.2  •  Ubuntu 22.04.3 LTS Launch Arguments: --no-half-vae No extension besides this is enabled.

andzejsp commented 1 year ago

I encountered the same error about tensors not being on the same device using the main branch for Automatic1111. It worked for me after I switched to the dev branch for Automatic1111 with SDXL.

My environment: Automatic1111 version: v1.6.0-263-g464fbcd9  •  python: 3.10.11  •  torch: 2.1.0+cu121  •  xformers: N/A  •  gradio: 3.41.2  •  Ubuntu 22.04.3 LTS Launch Arguments: --no-half-vae No extension besides this is enabled.

does anyone know if 1080ti can run this? I know it says only RTX cards.. but man.. 1080ti..

jz2010927 commented 1 year ago

Same

windows 10, i7-4790 + rtx 3060 version:[v1.6.0] •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: [d25fb39f3f] set COMMANDLINE_ARGS=--opt-sdp-no-mem-attention

Disabled all extensions except tensorRT

SD1.5 works fine

When use SDXL model to generate an image: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

andzejsp commented 1 year ago

Same

windows 10, i7-4790 + rtx 3060 version:[v1.6.0] •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: [d25fb39f3f] set COMMANDLINE_ARGS=--opt-sdp-no-mem-attention

Disabled all extensions except tensorRT

SD1.5 works fine

When use SDXL model to generate an image: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Use dev branch od automatic1111 Delete venv folder switch to dev branch. Profit

jz2010927 commented 1 year ago

Same windows 10, i7-4790 + rtx 3060 version:[v1.6.0] •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: [d25fb39f3f] set COMMANDLINE_ARGS=--opt-sdp-no-mem-attention Disabled all extensions except tensorRT SD1.5 works fine When use SDXL model to generate an image: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Use dev branch od automatic1111 Delete venv folder switch to dev branch. Profit

Tried dev, failed to export tensorRT model due to not enough VRAM(3060 12gb), and somehow the dev version can not find the tensorRT model from original Unet-trt folder after i copied to current Unet-trt folder. but anyway, thanks for reply.

andzejsp commented 1 year ago

Same windows 10, i7-4790 + rtx 3060 version:[v1.6.0] •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: [d25fb39f3f] set COMMANDLINE_ARGS=--opt-sdp-no-mem-attention Disabled all extensions except tensorRT SD1.5 works fine When use SDXL model to generate an image: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Use dev branch od automatic1111 Delete venv folder switch to dev branch. Profit

Tried dev, failed to export tensorRT model due to not enough VRAM(3060 12gb), and somehow the dev version can not find the tensorRT model from original Unet-trt folder after i copied to current Unet-trt folder. but anyway, thanks for reply.

what i did was first use main branch to install the extension, generate the engine, fail to render image using sdxl and tensor unet, switch to dev branch, delete venv folder, render image using the tensor unet and sdxl.. i never copied anything, just used the web UI, and only thing i deleted was the venv folder, everything stayed the same.

Deepdreamtime commented 1 year ago

how do i switch to dev branch then please ?

andzejsp commented 1 year ago

how do i switch to dev branch then please ?

i use vscode but open terminal in the stable-diffussion folder then type: git fetch dev then: git switch dev

arch1v1st commented 1 year ago

@andzejsp, @jz2010927 - personally, I didn't need to do any branch switching. Used the DEV branch solely while getting SDXL setup with TensorRT. The key from recollection was to totally restart the server (not just the UI) with each major step in the process while following these instructions:

https://nvidia.custhelp.com/app/answers/detail/a_id/5487/~/tensorrt-extension-for-stable-diffusion-web-ui

@Deepdreamtime - if your using git directly, this should do the trick:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
git switch dev
git status

All - while I'm up and running on Ubuntu with a 3080 TI with good success generating images much faster using SDXL base (no luck with the refiner model yet), there are still quite a few quirks which throw common errors from time to time when generating images and switching between SD checkpoints. Generally, completely restarting the SD WebUI server and trying again works around them. Hope these details help!

sdbds commented 1 year ago

same error

Deepdreamtime commented 1 year ago

It works. It finally worked. I don't even know how or what made the difference. Thanks bro. Take care. Peace. 

Envoyé depuis Yahoo Mail pour Android

Le mer., oct. 25, 2023 à 9:18, @.**@.> a écrit:

same error

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

kalle07 commented 11 months ago

with "dev branch" it worked also for me (i dont know how i get it in the end) ;)

rtx4060Ti 16GB 1024x1024 without 26sec with tensor 14sec (ok if you chage the resolution some sec more only one time)

pan00run commented 10 months ago

InAdding -- lowvram to ARGS works for me , --medvram failed.