AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.71k stars 27.04k forks source link

[Bug]: CUDA Out of memory #14053

Closed viik420 closed 11 months ago

viik420 commented 1 year ago

Is there an existing issue for this?

Update

I found out the cause of this. It is because of the Sysmem Memory Fallback for Stable Diffusion in Nvidia Cards. Stable diffusion is using RAM when it runs out of VRAM on windows. But this features is not available in Linux or maybe I can't find it. https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion

What happened?

I have installed stable diffusion webui or arch linux, and i run it with --medvram --xformers --no-half --precision full. when trying to generate images above 512x512 or using HighresFix upscale 2X, i get CUDA out of memory error. But I can generate images upto 1024x1024 on the same device in Windows 11 using same settings. It should work on linux too. I have already tried export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 and lowering it down to 64 but stilll got the error.

Screenshot_20231120_200209

Steps to reproduce the problem

  1. Go txt2img
  2. Set dimensions above 512x512 or use HighresFix at 2X
  3. Press generate

What should have happened?

It should be able to generate images above 512x512 resolution. Just like it does on windows on the same device with same sd webui settings. As you can see in the screenshot below, I have successfully generated images with a resolution of 1024x1024 with the same parameters without getting the CUDA out of memory error. image_3

Sysinfo

sysinfo-2023-11-21-15-49.txt

What browsers do you use to access the UI ?

No response

Console logs

bash webui.sh                                                                                       (sd)

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on scrocle user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc_minimal.so.4
Python 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
Version: v1.6.0-2-g4afaaf8a
Commit hash: 4afaaf8a020c1df457bcf7250cb1c7f609699fa7
Launching Web UI with arguments: --medvram --precision full --api --xformers --no-half --no-half-vae
[-] ADetailer initialized. version: 23.11.0, num models: 9
Loading weights [ec41bd2a82] from /home/scrocle/stable-diffusion-webui/models/Stable-diffusion/photon_v1.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: /home/scrocle/stable-diffusion-webui/configs/v1-inference.yaml
Startup time: 22.0s (prepare environment: 2.2s, import torch: 3.5s, import gradio: 0.8s, setup paths: 1.3s, initialize shared: 0.2s, other imports: 0.6s, setup codeformer: 0.3s, load scripts: 4.6s, create ui: 0.5s, gradio launch: 7.7s, add APIs: 0.3s).
Applying attention optimization: xformers... done.
Model loaded in 15.1s (load weights from disk: 1.1s, create model: 0.6s, apply weights to model: 5.2s, apply float(): 2.1s, load VAE: 0.1s, calculate empty prompt: 5.9s).
0%|                                                                                  | 0/30 [00:00<?, ?it/s]QObject::moveToThread: Current thread (0x1a74005f8600) is not the object's thread (0x1a74005f91e0).
Cannot move to target thread (0x1a74005f8600)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx.

[1121/212041.710199:ERROR:elf_dynamic_array_reader.h(64)] tag not found
[1121/212041.710686:ERROR:elf_dynamic_array_reader.h(64)] tag not found
[1121/212041.711388:ERROR:elf_dynamic_array_reader.h(64)] tag not found
0%|                                                                                  | 0/30 [00:14<?, ?it/s]
*** Error completing request
*** Arguments: ('task(lk5pumcg4ltf42a)', 'a black hole, interstellar, Christopher Nolan cinematography', 'low quality', [], 30, 'Euler a', 1, 1, 7, 768, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x7ff10fc05300>, 0, False, '', 0.8, 3871832817, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height':False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False,'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {}
Traceback (most recent call last):
File "/home/scrocle/stable-diffusion-webui/modules/call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "/home/scrocle/stable-diffusion-webui/modules/call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/modules/txt2img.py", line 55, in txt2img
processed = processing.process_images(p)
File "/home/scrocle/stable-diffusion-webui/modules/processing.py", line 732, in process_images
res = process_images_inner(p)
File "/home/scrocle/stable-diffusion-webui/modules/processing.py", line 867, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "/home/scrocle/stable-diffusion-webui/modules/processing.py", line 1140, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "/home/scrocle/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 235, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "/home/scrocle/stable-diffusion-webui/modules/sd_samplers_common.py", line 261, in launch_sampling
return func()
File "/home/scrocle/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 235, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py", line 169, in forward
x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "/home/scrocle/stable-diffusion-webui/modules/sd_hijack_utils.py", line 28, in __call__
return self.__orig_func(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1335, in forward
out = self.diffusion_model(x, t, context=cc)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/modules/sd_unet.py", line 91, in UNetModel_forward
return ldm.modules.diffusionmodules.openaimodel.copy_of_UNetModel_forward_for_webui(self, x, timesteps, context, *args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 797, in forward
h = module(h, emb, context)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
x = layer(x, context)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
x = block(x, context=context[i])
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 269, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 121, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs)  # type: ignore[misc]
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 136, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 274, in _forward
x = self.ff(self.norm3(x)) + x
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 76, in forward
return self.net(x)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "/home/scrocle/stable-diffusion-webui/modules/sd_hijack_utils.py", line 28, in __call__
return self.__orig_func(*args, **kwargs)
File "/home/scrocle/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 56, in forward
return x * F.gelu(gate)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 3.81 GiB total capacity; 3.60 GiB already allocated; 65.19 MiB free; 3.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

---
^CInterrupted with signal 2 in <frame at 0x16c66490, file '/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py', line 324, code wait>

# Thread: Thread-5(140673398326976)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/tqdm/_monitor.py", line 60, in run
self.was_killed.wait(self.sleep_interval)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 607, in wait
signaled = self._cond.wait(timeout)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 324, in wait
gotit = waiter.acquire(True, timeout)

# Thread: AnyIO worker thread(140672485025472)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 797, in run
item = self.queue.get()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/queue.py", line 171, in get
self.not_empty.wait()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 320, in wait
waiter.acquire()

# Thread: AnyIO worker thread(140673315460800)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 797, in run
item = self.queue.get()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/queue.py", line 171, in get
self.not_empty.wait()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 320, in wait
waiter.acquire()

# Thread: AnyIO worker thread(140673323853504)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 797, in run
item = self.queue.get()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/queue.py", line 171, in get
self.not_empty.wait()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 320, in wait
waiter.acquire()

# Thread: Thread-4 (run)(140673411962560)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File: "/home/scrocle/stable-diffusion-webui/venv/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
return asyncio.run(self.serve(sockets=sockets))
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/asyncio/base_events.py", line 633, in run_until_complete
self.run_forever()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/asyncio/base_events.py", line 600, in run_forever
self._run_once()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/asyncio/base_events.py", line 1860, in _run_once
event_list = self._selector.select(timeout)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/selectors.py", line 469, in select
fd_event_list = self._selector.poll(timeout, max_ev)

# Thread: fsspecIO(140673421403840)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/asyncio/base_events.py", line 600, in run_forever
self._run_once()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/asyncio/base_events.py", line 1860, in _run_once
event_list = self._selector.select(timeout)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/selectors.py", line 469, in select
fd_event_list = self._selector.poll(timeout, max_ev)

# Thread: MemMon(140674028529344)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File: "/home/scrocle/stable-diffusion-webui/modules/memmon.py", line 41, in run
self.run_flag.wait()
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 607, in wait
signaled = self._cond.wait(timeout)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 320, in wait
waiter.acquire()

# Thread: MainThread(140678301931456)
File: "/home/scrocle/stable-diffusion-webui/launch.py", line 48, in <module>
main()
File: "/home/scrocle/stable-diffusion-webui/launch.py", line 44, in main
start()
File: "/home/scrocle/stable-diffusion-webui/modules/launch_utils.py", line 436, in start
webui.webui()
File: "/home/scrocle/stable-diffusion-webui/webui.py", line 126, in webui
server_command = shared.state.wait_for_server_command(timeout=5)
File: "/home/scrocle/stable-diffusion-webui/modules/shared_state.py", line 62, in wait_for_server_command
if self._server_command_signal.wait(timeout):
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 607, in wait
signaled = self._cond.wait(timeout)
File: "/home/scrocle/anaconda3/envs/sd/lib/python3.10/threading.py", line 324, in wait
gotit = waiter.acquire(True, timeout)
File: "/home/scrocle/stable-diffusion-webui/modules/initialize_util.py", line 156, in sigint_handler
dumpstacks()
File: "/home/scrocle/stable-diffusion-webui/modules/initialize_util.py", line 143, in dumpstacks
for filename, lineno, name, line in traceback.extract_stack(stack):

Additional information

No response

GitHubMJW commented 1 year ago

--medvram --xformers --no-half --precision full

You have an NVIDIA GPU since you get CUDA errors and use xformers, yet you set --no-half and --precision full. Why? Get rid of those arguments and see if you don't get better performance and use less VRAM.

viik420 commented 1 year ago

--medvram --xformers --no-half --precision full

You have an NVIDIA GPU since you get CUDA errors and use xformers, yet you set --no-half and --precision full. Why? Get rid of those arguments and see if you don't get better performance and use less VRAM.

I use --no-half because it asked me to do so. My GPU doesn't support half-precision. If i run without --no-half I get this error:

NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

GitHubMJW commented 1 year ago

What type of GPU do you have? (That's something you probably should have mentioned in the original post. If not then, at least in your reply to me.)

viik420 commented 1 year ago

What type of GPU do you have? (That's something you probably should have mentioned in the original post. If not then, at least in your reply to me.) I have a NVIDIA GeForce GTX 1650 with 4GB VRAM.

I have already found the problem. I found out the cause of this. It is because of the Sysmem Memory Fallback for Stable Diffusion in Nvidia Cards. Stable diffusion is using RAM when it runs out of VRAM on windows. But this features is not available in Linux or maybe I can't find it. https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion

IllarionovDimitri commented 1 year ago

Have the same problem running Stable Diffusion on Ubuntu distribution. Neither --medvram nor --lowvram flags help. Would highly appreciate the solution for this one

guyingi commented 12 months ago

What type of GPU do you have? (That's something you probably should have mentioned in the original post. If not then, at least in your reply to me.)

hi, can you help me have a analysis,i also have a problem with not support half precision, my GPU is RTX4070Ti 12GB,search nvidia official websit, this GPT is support fp16,in vscode,i test code"tensor = torch.randn(3, 3);device = torch.device("cuda");tensor = tensor.to(device).half()" with stable-diffusion-webui's system enviroment,it's kernel path "D:\ProgramFiles\sd.webui\system\python\python.exe", actually python310 ,it can run without error, i install stable-diffusion-webui last week with the new version, however when i test img2img in ui, it tell me "NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. "

Fahad-Aslam commented 11 months ago

I encountered a comparable issue (as far as I recall) on a Windows system and resolved it by increasing the virtual RAM (which designates a portion of the hard drive for use as additional RAM) to 30GB. Maybe give it a try.

IllarionovDimitri commented 11 months ago

if someone runs SD workloads on aws check g5 instance family, they have 24 Gb of VRAM. Works better now

ThrosturX commented 9 months ago

I'm having a hard time understanding why this issue was closed. The solution is to "just use windows"?

viik420 commented 9 months ago

I'm having a hard time understanding why this issue was closed. The solution is to "just use windows"?

There is no option for Sysmem Fallback memory in Linux version of Nvidia Control Panel. So no one knows what to do until Nvidia provides the option.