[Bug]: Running out of VRAM AMD 7900 xt

target-bravo commented 1 week ago

Checklist

[x] The issue exists after disabling all extensions
[x] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[x] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

Trying to run Txt2img on a 7900xt at a resolution of 540x960 and a 2x hires fix and i keep getting "RuntimeError: Could not allocate tensor with 18144080 bytes. There is not enough GPU video memory available!"

The below is my current cmd line args

COMMANDLINE_ARGS= --use-directml --port 80 --listen --enable-insecure-extension-access --no-half-vae

Any ideas on how to get this running smoothly?

Steps to reproduce the problem

run any image generation at high ish resolutions.

What should have happened?

Generate image without using more than the total vram

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

{ "Platform": "Windows-10-10.0.22631-SP0", "Python": "3.10.6", "Version": "v1.10.1-amd-5-gd8b7380b", "Commit": "d8b7380b18d044d2ee38695c58bae3a786689cf3", "Git status": "On branch master\nYour branch is up to date with 'origin/master'.\n\nChanges not staged for commit:\n (use \"git add ...\" to update what will be committed)\n (use \"git restore ...\" to discard changes in working directory)\n\tmodified: webui-user.bat\n\nUntracked files:\n (use \"git add ...\" to include in what will be committed)\n\tvenv.old/\n\nno changes added to commit (use \"git add\" and/or \"git commit -a\")", "Script path": "E:\SD\stable-diffusion-webui-directml", "Data path": "E:\SD\stable-diffusion-webui-directml", "Extensions dir": "E:\SD\stable-diffusion-webui-directml\extensions", "Checksum": "73533d0a0366e6ef83e2deeef5c879a5771e36bd91c85e0abe94fe10ca333a99", "Commandline": [ "launch.py", "--use-directml", "--port", "80", "--listen", "--enable-insecure-extension-access", "--no-half-vae" ], "Torch env info": { "torch_version": "2.3.1+cpu", "is_debug_build": "False", "cuda_compiled_version": null, "gcc_version": null, "clang_version": null, "cmake_version": null, "os": "Microsoft Windows 11 Pro", "libc_version": "N/A", "python_version": "3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)", "python_platform": "Windows-10-10.0.22631-SP0", "is_cuda_available": "False", "cuda_runtime_version": null, "cuda_module_loading": "N/A", "nvidia_driver_version": null, "nvidia_gpu_models": null, "cudnn_version": null, "pip_version": "pip3", "pip_packages": [ "numpy==1.26.2", "onnx==1.16.2", "onnxruntime==1.19.0", "onnxruntime-directml==1.19.0", "open-clip-torch==2.20.0", "pytorch-lightning==1.9.4", "torch==2.3.1", "torch-directml==0.2.4.dev240815", "torchdiffeq==0.2.3", "torchmetrics==1.4.1", "torchsde==0.2.6", "torchvision==0.18.1" ], "conda_packages": null, "hip_compiled_version": "N/A", "hip_runtime_version": "N/A", "miopen_runtime_version": "N/A", "caching_allocator_config": "", "is_xnnpack_available": "True", "cpu_info": [ "Architecture=9", "CurrentClockSpeed=3394", "DeviceID=CPU0", "Family=107", "L2CacheSize=4096", "L2CacheSpeed=", "Manufacturer=AuthenticAMD", "MaxClockSpeed=3394", "Name=AMD Ryzen 7 5800X3D 8-Core Processor ", "ProcessorType=3", "Revision=8450" ] }, "Exceptions": [ { "exception": "Could not allocate tensor with 18144080 bytes. There is not enough GPU video memory available!", "traceback": [ [ "E:\SD\stable-diffusion-webui-directml\modules\call_queue.py, line 74, f", "res = list(func(*args, kwargs))" ], [ "E:\SD\stable-diffusion-webui-directml\modules\call_queue.py, line 53, f", "res = func(*args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\call_queue.py, line 37, f", "res = func(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\txt2img.py, line 109, txt2img", "processed = processing.process_images(p)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\processing.py, line 849, process_images", "res = process_images_inner(p)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\processing.py, line 1083, process_images_inner", "samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\processing.py, line 1457, sample", "return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\processing.py, line 1549, sample_hr_pass", "samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py, line 187, sample_img2img", "samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, extra_params_kwargs))" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_common.py, line 272, launch_sampling", "return func()" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py, line 187, ", "samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, extra_params_kwargs))" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils\_contextlib.py, line 115, decorate_context", "return func(*args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\sampling.py, line 594, sample_dpmpp_2m", "denoised = model(x, sigmas[i] * s_in, *extra_args)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(*args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_cfg_denoiser.py, line 268, forward", "x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(*args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py, line 112, forward", "eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py, line 138, get_eps", "return self.inner_model.apply_model(*args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py, line 22, ", "setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs))" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py, line 34, call", "return self.__sub_func(self.__orig_func, args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_unet.py, line 50, apply_model", "result = orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py, line 22, ", "setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs))" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py, line 36, call", "return self.__orig_func(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py, line 858, apply_model", "x_recon = self.model(x_noisy, t, cond)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(*args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py, line 1335, forward", "out = self.diffusion_model(x, t, context=cc)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(*args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_unet.py, line 91, UNetModel_forward", "return original_forward(self, x, timesteps, context, *args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py, line 802, forward", "h = module(h, emb, context)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(*args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py, line 84, forward", "x = layer(x, context)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(*args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(*args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py, line 22, ", "setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs))" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py, line 34, call", "return self.__sub_func(self.__orig_func, *args, *kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_unet.py, line 96, spatial_transformer_forward", "x = block(x, context=context[i])" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(*args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py, line 269, forward", "return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py, line 123, checkpoint", "return func(inputs)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py, line 272, _forward", "x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1532, _wrapped_call_impl", "return self._call_impl(args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py, line 1541, _call_impl", "return forward_call(*args, kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py, line 393, split_cross_attention_forward_invokeAI", "r = einsum_op(q, k, v)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py, line 367, einsum_op", "return einsum_op_dml(q, k, v)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py, line 354, einsum_op_dml", "return einsum_op_tensor_mem(q, k, v, (mem_reserved - mem_active) if mem_reserved > mem_active else 1)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py, line 336, einsum_op_tensor_mem", "return einsum_op_slice_1(q, k, v, max(q.shape[1] // div, 1))" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py, line 308, einsum_op_slice_1", "r[:, i:end] = einsum_op_compvis(q[:, i:end], k, v)" ] ] }, { "exception": "None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'\nIf this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>", "traceback": [ [ "E:\SD\stable-diffusion-webui-directml\modules\sd_models.py, line 831, load_model", "sd_model = instantiate_from_config(sd_config.model, state_dict)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_models.py, line 775, instantiate_from_config", "return constructor(params)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py, line 563, init", "self.instantiate_cond_stage(cond_stage_config)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py, line 630, instantiate_cond_stage", "model = instantiate_from_config(config)" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\util.py, line 89, instantiate_from_config", "return get_obj_from_str(config[\"target\"])(*config.get(\"params\", dict()))" ], [ "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\encoders\modules.py, line 104, init", "self.transformer = CLIPTextModel.from_pretrained(version)" ], [ "E:\SD\stable-diffusion-webui-directml\modules\sd_disable_initialization.py, line 68, CLIPTextModel_from_pretrained", "res = self.CLIPTextModel_from_pretrained(None, model_args, config=pretrained_model_name_or_path, state_dict={}, **kwargs)" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\modeling_utils.py, line 3213, from_pretrained", "resolved_config_file = cached_file(" ], [ "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\utils\hub.py, line 425, cached_file", "raise EnvironmentError(" ] ] } ], "CPU": { "model": "AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD", "count logical": 16, "count physical": 8 }, "RAM": { "total": "16GB", "used": "11GB", "free": "5GB" }, "Extensions": [ { "name": "multidiffusion-upscaler-for-automatic1111", "path": "E:\SD\stable-diffusion-webui-directml\extensions\multidiffusion-upscaler-for-automatic1111", "commit": "22798f6822bc9c8a905b51da8954ee313b973331", "branch": "main", "remote": "https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111.git" } ], "Inactive extensions": [], "Environment": { "COMMANDLINE_ARGS": " --use-directml --port 80 --listen --enable-insecure-extension-access --no-half-vae", "GRADIO_ANALYTICS_ENABLED": "False" }, "Config": { "ldsr_steps": 100, "ldsr_cached": false, "SCUNET_tile": 256, "SCUNET_tile_overlap": 8, "SWIN_tile": 192, "SWIN_tile_overlap": 8, "SWIN_torch_compile": false, "hypertile_enable_unet": false, "hypertile_enable_unet_secondpass": false, "hypertile_max_depth_unet": 3, "hypertile_max_tile_unet": 256, "hypertile_swap_size_unet": 3, "hypertile_enable_vae": false, "hypertile_max_depth_vae": 3, "hypertile_max_tile_vae": 128, "hypertile_swap_size_vae": 3, "sd_model_checkpoint": "chilloutmix_NiPrunedFp32Fix.safetensors [fc2511737a]", "sd_checkpoint_hash": "fc2511737a54c5e80b89ab03e0ab4b98d051ab187f92860f3cd664dc9d08b271" }, "Startup": { "total": 46.024452209472656, "records": { "initial startup": 0.12500858306884766, "prepare environment/checks": 0.0, "prepare environment/git version info": 0.6563024520874023, "prepare environment/clone repositores": 0.28126049041748047, "prepare environment/run extensions installers/multidiffusion-upscaler-for-automatic1111": 0.015625715255737305, "prepare environment/run extensions installers": 0.015625715255737305, "prepare environment": 75.71254301071167, "launcher": 0.0020012855529785156, "import torch": 0.0, "import gradio": 0.0, "setup paths": 0.0010001659393310547, "import ldm": 0.0030002593994140625, "import sgm": 0.0, "initialize shared": 2.3184762001037598, "other imports": 0.03450608253479004, "opts onchange": 0.0, "setup SD model": 0.0004999637603759766, "setup codeformer": 0.0010004043579101562, "setup gfpgan": 0.01700282096862793, "set samplers": 0.0, "list extensions": 0.0015003681182861328, "restore config state file": 0.0, "list SD models": 0.040509700775146484, "list localizations": 0.0005002021789550781, "load scripts/custom_code.py": 0.0055010318756103516, "load scripts/img2imgalt.py": 0.0010004043579101562, "load scripts/loopback.py": 0.0004999637603759766, "load scripts/outpainting_mk_2.py": 0.0004999637603759766, "load scripts/poor_mans_outpainting.py": 0.0005002021789550781, "load scripts/postprocessing_codeformer.py": 0.0004999637603759766, "load scripts/postprocessing_gfpgan.py": 0.0005002021789550781, "load scripts/postprocessing_upscale.py": 0.0004999637603759766, "load scripts/prompt_matrix.py": 0.0010001659393310547, "load scripts/prompts_from_file.py": 0.0005002021789550781, "load scripts/sd_upscale.py": 0.0004999637603759766, "load scripts/xyz_grid.py": 0.0020003318786621094, "load scripts/ldsr_model.py": 0.3000602722167969, "load scripts/lora_script.py": 0.11002206802368164, "load scripts/scunet_model.py": 0.02150416374206543, "load scripts/swinir_model.py": 0.020003318786621094, "load scripts/hotkey_config.py": 0.0, "load scripts/extra_options_section.py": 0.0010001659393310547, "load scripts/hypertile_script.py": 0.035008907318115234, "load scripts/postprocessing_autosized_crop.py": 0.0010001659393310547, "load scripts/postprocessing_caption.py": 0.0004999637603759766, "load scripts/postprocessing_create_flipped_copies.py": 0.0005002021789550781, "load scripts/postprocessing_focal_crop.py": 0.0020003318786621094, "load scripts/postprocessing_split_oversized.py": 0.0005002021789550781, "load scripts/soft_inpainting.py": 0.0010001659393310547, "load scripts/tilediffusion.py": 0.044507503509521484, "load scripts/tileglobal.py": 0.016003131866455078, "load scripts/tilevae.py": 0.014503002166748047, "load scripts/comments.py": 0.020005464553833008, "load scripts/refiner.py": 0.0010006427764892578, "load scripts/sampler.py": 0.0004999637603759766, "load scripts/seed.py": 0.0004999637603759766, "load scripts": 0.6036219596862793, "load upscalers": 0.003500699996948242, "refresh VAE": 0.0009999275207519531, "refresh textual inversion templates": 0.0005002021789550781, "scripts list_optimizers": 0.0010001659393310547, "scripts list_unets": 0.0, "reload hypernetworks": 0.0005002021789550781, "initialize extra networks": 0.054009437561035156, "scripts before_ui_callback": 0.003500699996948242, "create ui": 0.4175848960876465, "gradio launch": 4.347925186157227, "add APIs": 0.008502006530761719, "app_started_callback/lora_script.py": 0.0004999637603759766, "app_started_callback": 0.0004999637603759766 } }, "Packages": [ "accelerate==0.21.0", "aenum==3.1.15", "aiofiles==23.2.1", "aiohappyeyeballs==2.4.0", "aiohttp==3.10.5", "aiosignal==1.3.1", "alembic==1.13.2", "altair==5.4.1", "antlr4-python3-runtime==4.9.3", "anyio==3.7.1", "async-timeout==4.0.3", "attrs==24.2.0", "blendmodes==2022", "certifi==2024.8.30", "charset-normalizer==3.3.2", "clean-fid==0.1.35", "click==8.1.7", "clip @ https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip#sha256=b5842c25da441d6c581b53a5c60e0c2127ebafe0f746f8e15561a006c6c3be6a", "colorama==0.4.6", "coloredlogs==15.0.1", "colorlog==6.8.2", "contourpy==1.3.0", "cycler==0.12.1", "datasets==2.21.0", "deprecation==2.1.0", "diffusers==0.30.2", "dill==0.3.8", "diskcache==5.6.3", "einops==0.4.1", "exceptiongroup==1.2.2", "facexlib==0.3.0", "fastapi==0.94.0", "ffmpy==0.4.0", "filelock==3.15.4", "filterpy==1.4.5", "flatbuffers==24.3.25", "fonttools==4.53.1", "frozenlist==1.4.1", "fsspec==2024.6.1", "ftfy==6.2.3", "gitdb==4.0.11", "GitPython==3.1.32", "gradio==3.41.2", "gradio_client==0.5.0", "greenlet==3.0.3", "h11==0.12.0", "httpcore==0.15.0", "httpx==0.24.1", "huggingface-hub==0.24.6", "humanfriendly==10.0", "idna==3.8", "imageio==2.35.1", "importlib_metadata==8.4.0", "importlib_resources==6.4.4", "inflection==0.5.1", "intel-openmp==2021.4.0", "Jinja2==3.1.4", "jsonmerge==1.8.0", "jsonschema==4.23.0", "jsonschema-specifications==2023.12.1", "kiwisolver==1.4.6", "kornia==0.6.7", "lark==1.1.2", "lazy_loader==0.4", "lightning-utilities==0.11.7", "llvmlite==0.43.0", "Mako==1.3.5", "MarkupSafe==2.1.5", "matplotlib==3.9.2", "mkl==2021.4.0", "mpmath==1.3.0", "multidict==6.0.5", "multiprocess==0.70.16", "narwhals==1.6.2", "networkx==3.3", "numba==0.60.0", "numpy==1.26.2", "olive-ai==0.6.2", "omegaconf==2.2.3", "onnx==1.16.2", "onnxruntime==1.19.0", "onnxruntime-directml==1.19.0", "open-clip-torch==2.20.0", "opencv-python==4.10.0.84", "optimum==1.21.4", "optuna==4.0.0", "orjson==3.10.7", "packaging==24.1", "pandas==2.2.2", "piexif==1.1.3", "Pillow==9.5.0", "pillow-avif-plugin==1.4.3", "pip==24.2", "protobuf==3.20.2", "psutil==5.9.5", "pyarrow==17.0.0", "pydantic==1.10.18", "pydub==0.25.1", "pyparsing==3.1.4", "pyreadline3==3.4.1", "python-dateutil==2.9.0.post0", "python-multipart==0.0.9", "pytorch-lightning==1.9.4", "pytz==2024.1", "PyWavelets==1.7.0", "PyYAML==6.0.2", "referencing==0.35.1", "regex==2024.7.24", "requests==2.32.3", "resize-right==0.0.2", "rpds-py==0.20.0", "safetensors==0.4.2", "scikit-image==0.21.0", "scipy==1.14.1", "semantic-version==2.10.0", "sentencepiece==0.2.0", "setuptools==69.5.1", "six==1.16.0", "smmap==5.0.1", "sniffio==1.3.1", "spandrel==0.3.4", "spandrel_extra_arches==0.1.1", "SQLAlchemy==2.0.33", "starlette==0.26.1", "sympy==1.13.2", "tbb==2021.13.1", "tifffile==2024.8.30", "timm==1.0.9", "tokenizers==0.19.1", "tomesd==0.1.3", "torch==2.3.1", "torch-directml==0.2.4.dev240815", "torchdiffeq==0.2.3", "torchmetrics==1.4.1", "torchsde==0.2.6", "torchvision==0.18.1", "tqdm==4.66.5", "trampoline==0.1.2", "transformers==4.43.4", "typing_extensions==4.12.2", "tzdata==2024.1", "urllib3==2.2.2", "uvicorn==0.30.6", "wcwidth==0.2.13", "websockets==11.0.3", "xxhash==3.5.0", "yarl==1.9.8", "zipp==3.20.1" ] }

Console logs

File "E:\SD\stable-diffusion-webui-directml\modules\call_queue.py", line 74, in f
        res = list(func(*args, **kwargs))
      File "E:\SD\stable-diffusion-webui-directml\modules\call_queue.py", line 53, in f
        res = func(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\call_queue.py", line 37, in f
        res = func(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "E:\SD\stable-diffusion-webui-directml\modules\processing.py", line 849, in process_images
        res = process_images_inner(p)
      File "E:\SD\stable-diffusion-webui-directml\modules\processing.py", line 1083, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "E:\SD\stable-diffusion-webui-directml\modules\processing.py", line 1457, in sample
        return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
      File "E:\SD\stable-diffusion-webui-directml\modules\processing.py", line 1549, in sample_hr_pass
        samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 187, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 272, in launch_sampling
        return func()
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 187, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\sampling.py", line 594, in sample_dpmpp_2m
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_samplers_cfg_denoiser.py", line 268, in forward
        x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 22, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 34, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_unet.py", line 50, in apply_model
        result = orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 22, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 36, in __call__
        return self.__orig_func(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_unet.py", line 91, in UNetModel_forward
        return original_forward(self, x, timesteps, context, *args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 802, in forward
        h = module(h, emb, context)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
        x = layer(x, context)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 22, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 34, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_unet.py", line 96, in spatial_transformer_forward
        x = block(x, context=context[i])
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 269, in forward
        return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 123, in checkpoint
        return func(*inputs)
      File "E:\SD\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 272, in _forward
        x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py", line 393, in split_cross_attention_forward_invokeAI
        r = einsum_op(q, k, v)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py", line 367, in einsum_op
        return einsum_op_dml(q, k, v)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py", line 354, in einsum_op_dml
        return einsum_op_tensor_mem(q, k, v, (mem_reserved - mem_active) if mem_reserved > mem_active else 1)
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py", line 336, in einsum_op_tensor_mem
        return einsum_op_slice_1(q, k, v, max(q.shape[1] // div, 1))
      File "E:\SD\stable-diffusion-webui-directml\modules\sd_hijack_optimizations.py", line 308, in einsum_op_slice_1
        r[:, i:end] = einsum_op_compvis(q[:, i:end], k, v)
    RuntimeError: Could not allocate tensor with 18144080 bytes. There is not enough GPU video memory available!

Additional information

No response

punkbuzter commented 1 week ago

Same issue here, i see the VRAM being consumed more and more by the generation using XL models. Latest drivers from Nvidia for my 3060 12GB, with CUDA System Fallback ON. Arguments: --opt-sdp-attention --no-half-vae --autolaunch Python: 3.10.11 Thorch: 2.1.2+cu121 Gradio: 3.41.2 Checkpoint: c745095993

Also, the A11 version at the bottom of the UI says I have version 1.10.1 but only 1.10.0 has been released according to the website.

CS1o commented 1 week ago

Same issue here, i see the VRAM being consumed more and more by the generation using XL models. Latest drivers from Nvidia for my 3060 12GB, with CUDA System Fallback ON. Arguments: --opt-sdp-attention --no-half-vae --autolaunch Python: 3.10.11 Thorch: 2.1.2+cu121 Gradio: 3.41.2 Checkpoint: c745095993

Your on an old torch version. Also better use --xformers instead of --opt-sdp-attention as xformers uses less vram. Also delete your venv folder and then relaunch the webui-user.bat to get the latest torch and xformers version installed.

CS1o commented 1 week ago

@target-bravo Hey, directml is known for beeing slow and vram hungry (bad optmised). AMD Users should use the Automatic1111 ZLUDA Version instead. Its very fast and uses much less vram.

You can find all my AMD Webui Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides Follow the AMD Automatic1111 with ZLUDA Guide for the best Performance.

Im running a 7900XTX without any vram problems. Even upscaling SDXL Images works.

Xephier102 commented 6 days ago

Same issue here, i see the VRAM being consumed more and more by the generation using XL models. Latest drivers from Nvidia for my 3060 12GB, with CUDA System Fallback ON. Arguments: --opt-sdp-attention --no-half-vae --autolaunch Python: 3.10.11 Thorch: 2.1.2+cu121 Gradio: 3.41.2 Checkpoint: c745095993

Your on an old torch version. Also better use --xformers instead of --opt-sdp-attention as xformers uses less vram. Also delete your venv folder and then relaunch the webui-user.bat to get the latest torch and xformers version installed.

Can't use Xformers without old Torch. Even the non-stable/beta/dev version or w.e, will not run with torch 2.4.1. I had a headache trying to get xformers running. I wish I could.. If I install the most recent dev version of Xformers, it actually uninstalls my new torch install to install 2.4.0, and installs it without CUDA to boot.. I even rooted around and looked for a 2.4.0 with cuda install, I installed that, and then A1111 wouldn't start up anymore.. So I eventually said screw it and deleted the virtual env and started over.

markg85 commented 5 days ago

@target-bravo Hey, directml is known for beeing slow and vram hungry (bad optmised). AMD Users should use the Automatic1111 ZLUDA Version instead. Its very fast and uses much less vram.

You can find all my AMD Webui Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides Follow the AMD Automatic1111 with ZLUDA Guide for the best Performance.

Im running a 7900XTX without any vram problems. Even upscaling SDXL Images works.

It's great that you're a ZLUDA fanboy and all. But please stop spreading your BS regarding DirectML. We sadly live in a world where nvidia is king and ruler in AI. AMD is playing catch-up. ZLUDA is a band aid that might be fun "for now" but it's not going to be a long term thing. Now i'm much a fan of Linux and open source, DirectML as a microsoft thing isn't great either. Or we can just conclude that there are no good true cross platform options at the moment.

If you are on AMD and on Windows then your best and most performant option would be DirectML (at a reduced feature set).

CS1o commented 5 days ago

It's great that you're a ZLUDA fanboy and all. But please stop spreading your BS regarding DirectML. We sadly live in a world where nvidia is king and ruler in AI. AMD is playing catch-up. ZLUDA is a band aid that might be fun "for now" but it's not going to be a long term thing. Now i'm much a fan of Linux and open source, DirectML as a microsoft thing isn't great either. Or we can just conclude that there are no good true cross platform options at the moment.

If you are on AMD and on Windows then your best and most performant option would be DirectML (at a reduced feature set).

Your saying DirectML is the best and pointing on an Image of Microsoft Olive+ONNX....

Let me get the Facts right for you: DirectML is not Olive+ONNX. MS Olive and ONNX are Tools FOR DirectML as you can read here on MS: https://learn.microsoft.com/en-us/windows/ai/directml/dml-tools DirectML can work without Olive and ONNX.

Im saying DirectML is slow and uses a lot of VRAM, which is true if you setup Automatic1111 for AMD with native DirectML (without Olive+ONNX). Its slow and uses the nearly full VRAM Amount for any image generation and goes OOM pretty fast with the wrong settings.

Olive and ONNX improve the performance a lot but at the cost of needing to convert every model you want to use and having limited to no extension support.

I used DirectML without Olive or ONNX for over a year on my 7900XTX and on a 6700XT before without issues regarding models, loras, or extensions. Issues with DirectML other than the speed/vram and resolution limits were Gray Squares when upscaling and the need of --no-half for inpainting, which increases the vram usage a lot. Zluda fixed everything for me.

On my Github Site I have DirectML and Zluda Guides. The DirectML Guides dont cover the Olive+ONNX Setup because after talking with many AMD Users over the last 2 years, nobody likes to convert their models everytime if there is an alternative that doesnt require it. With the extra cost of some extensions compatiblity and a more complex workflow.

Zluda is the best compromise of Performance and compatibility for AMD Users on Windows right now. The Setup isn't to difficult for normal non techy users. The VRAM Usage is significantly lesser than DirectML. Getting 16 it/s on Zluda compared to 4-6 it/s on DirectML (1.5 model, 20 steps, 512x512) speaks for itself. All the while, the VRAM is not even maxed out. I can easily use higher Batch Sizes and upscale without problems too. Sure I might be loosing 2 it/s when comparing to an ONNX+Olive setup, but the convenience and ease of use of such workflow make it a no brainer for me.

Regarding Linux: Stable Diffusion with AMD has the best Performance on Linux right now. So dualbooting is an option for techy Users but maybe not for average Windows Users. AMD released Adrenalin Driver 24.6.1 which enabled Beta Support for WSL2 on Win11 and Win10 with ROCm. So if we only look for Performance and compatbility one might argue that this is the best Option for AMD Windows Users. But the Setup Process is a barrier of entry for your average PC User.

That's why I still recommend using ZLUDA to normal AMD Users instead of anything else. Normal Users wont bother with dual Boot or setup WSL2 with Docker, etc.. if they can get it to run with more easily. Also they want the best usability/compatiblity of the webui too. Currently only Zluda provides this. This might be a temporary solution until AMD provides something better, which they seem to be working on.

A couple of extra Solutions:

If you still like to get the best performance over usability and compatiblity why not tried Shark from Nod.ai (bought by AMD now) which has even a better performance than Automatic1111 Directml with olive/onnx on windows.
Shark is currently getting a rebuild and is broken but AMD features a new Software called Amuse. Amuse has now even Flux support, looks promising but still lacks the Features of Automatic1111.

Conclusion: AMD on Windows: Automatic1111 with DirectML = (bad performance, good compatibility, easy setup) Automatic1111 with DirectML with Olive+ONNX = (great performance, bad compatibility, easy setup, model conversion required) Automatic1111 with ZLUDA = (great performance, good compatibility, normal setup) Automatic1111 with WSL2/ROCm = (great performance, great compatiblity, difficult setup)

For Linux: Automatic1111 with ROCm = (great performance, great compatiblity, normal setup)

Tldr: Currently ZLUDA offers the best mix of Performance, Usability, and Compatbility on Windows. And there is WSL2 with ROCm for tech savy Users who dont want to Dualboot.

yours truly, """ZLUDA fanboy"""

markg85 commented 5 days ago

Let me get the Facts right for you: DirectML is not Olive+ONNX. MS Olive and ONNX are Tools FOR DirectML as you can read here on MS: https://learn.microsoft.com/en-us/windows/ai/directml/dml-tools DirectML can work without Olive and ONNX.

Thank you for that correction, you're absolutely right! The abbreviations and mixture of tech names are confusing in the AI world.

What i'm not wrong about is that DirectML is, and i quote, a low level hardware abstraction layer. In essense that's comparable to Cuda and OpenCL.

The image i showed was, in hindsight, a bit misleading as it indeed shows OpenML + more (Olive). The importance here is that it shows it can be optimized for inference.

And there is WSL2 with ROCm for tech savy Users who dont want to Dualboot.

I as a linux user consider that kind of "running linux in windows" as a big laughing joke. It might work for the windows users just following a linux guide.

AUTOMATIC1111 / stable-diffusion-webui