dbolya / tomesd

Speed up Stable Diffusion with this one simple trick!
MIT License
1.24k stars 79 forks source link

directml #13

Open motorist828 opened 1 year ago

motorist828 commented 1 year ago

Hi I tried running this using https://github.com/lshqqytiger/stable-diffusion-webui-directml Also this extension https://git.mmaker.moe/mmaker/sd-webui-tome I get errors when I turn on ToMe is this related to using torch:1.13.1 or is it a problem with directml? AMD video cards are very slow, my vega 56 is 7 times slower than rtx3060 Your work would be very useful for AMD owners https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/61

venv "D:\neiro\last\stable-diffusion-webui-directml\venv\Scripts\Python.exe" Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Commit hash: ae337fa39b6d4598b377ff312c53b14c15142331 Installing requirements for Web UI Launching Web UI with arguments: --medvram --disable-nan-check --autolaunch --opt-split-attention-invokeai --opt-sub-quad-attention --theme dark --no-half --precision full --no-half-vae --ckpt-dir D:\neiro\AMD\stable-diffusion-webui-directml\ Warning: experimental graphic memory optimization is disabled due to gpu vendor. Currently this optimization is only available for AMDGPUs. Disabled experimental graphic memory optimizations. Interrogations are fallen back to cpu. This doesn't affect on image generation. But if you want to use interrogate (CLIP or DeepBooru), check out this issue: https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/10 Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled No module 'xformers'. Proceeding without it. Loading weights [2085909b28] from D:\neiro\AMD\stable-diffusion-webui-directml\models\Stable-diffusion\donkoMix_donkoMix.safetensors Creating model from config: D:\neiro\last\stable-diffusion-webui-directml\configs\v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Loading VAE weights specified in settings: D:\neiro\AMD\stable-diffusion-webui-directml\models\VAE\novelai.vae.pt Applying sub-quadratic cross attention optimization. Textual inversion embeddings loaded(0): Applying ToMe patch... ToMe patch applied Model loaded in 1.9s (load weights from disk: 0.2s, create model: 0.5s, apply weights to model: 0.6s, load VAE: 0.5s). Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 61.7s (import torch: 1.7s, import gradio: 1.2s, import ldm: 0.5s, other imports: 2.3s, list SD models: 42.8s, load scripts: 1.3s, refresh VAE: 2.0s, load SD checkpoint: 2.0s, create ui: 7.5s, gradio launch: 0.3s). 0%| | 0/26 [00:02<?, ?it/s] Error completing request Arguments: ('task(grxvqmflcpyiis9)', '1girl', '(worst quality, low quality:1.4), (monochrome), zombie,badv3, badhandv4', [], 26, 15, False, False, 1, 1, 6, 11691188.0, -1.0, 0, 0, 0, False, 896, 896, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 'MultiDiffusion', False, 10, 1, 1, 64, False, True, 1024, 1024, 96, 96, 48, 1, 'None', 2, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, False, True, True, False, 512, 64, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {} Traceback (most recent call last): File "D:\neiro\last\stable-diffusion-webui-directml\modules\call_queue.py", line 56, in f res = list(func(*args, kwargs)) File "D:\neiro\last\stable-diffusion-webui-directml\modules\call_queue.py", line 37, in f res = func(*args, *kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\modules\txt2img.py", line 56, in txt2img processed = process_images(p) File "D:\neiro\last\stable-diffusion-webui-directml\modules\processing.py", line 503, in process_images res = process_images_inner(p) File "D:\neiro\last\stable-diffusion-webui-directml\modules\processing.py", line 653, in process_images_inner samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts) File "D:\neiro\last\stable-diffusion-webui-directml\modules\processing.py", line 869, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "D:\neiro\last\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 358, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={ File "D:\neiro\last\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 234, in launch_sampling return func() File "D:\neiro\last\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 358, in samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={ File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\sampling.py", line 599, in sample_dpmpp_2m denoised = model(x, sigmas[i] * s_in, extra_args) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 132, in forward x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict([cond_in[a:b]], image_cond_in[a:b])) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps return self.inner_model.apply_model(*args, *kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs)) File "D:\neiro\last\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 28, in call return self.__orig_func(args, kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, cond) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward out = self.diffusion_model(x, t, context=cc) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\patch.py", line 172, in forward return super().forward(*args, *kwdargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 797, in forward h = module(h, emb, context) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward x = layer(x, context) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 334, in forward x = block(x, context=context[i]) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, *kwargs) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 269, in forward return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 121, in checkpoint return CheckpointFunction.apply(func, len(inputs), args) File "D:\neiro\last\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 136, in forward output_tensors = ctx.run_function(*ctx.input_tensors) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\patch.py", line 48, in _forward m_a, m_c, m_m, u_a, u_c, u_m = compute_merge(x, self._tome_info) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\patch.py", line 21, in compute_merge m, u = merge.bipartite_soft_matching_random2d(x, w, h, args["sx"], args["sy"], r, not args["use_rand"]) File "D:\neiro\last\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\merge.py", line 83, in bipartite_soft_matching_random2d dst_idx = node_idx[..., None].gather(dim=-2, index=src_idx) RuntimeError

dbolya commented 1 year ago

Seems to be an issue with the requirements of gather, similar to M1 Macs (#4).
Are you able to get any more information than just "RuntimeError"? It would be useful to know if this is the same issue.

motorist828 commented 1 year ago

what information do you need? I will try to provide everything that you need within the limits of my skills.

dbolya commented 1 year ago

Ah, nvm I found it: https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_gather_operator_desc

It is indeed the same issue:

IndicesTensor Type: const DML_TENSOR_DESC A tensor containing the indices. The DimensionCount of this tensor must match InputTensor.DimensionCount.

Interesting that multiple libraries have this very restrictive stipulations on their gather operations. If I can find a way to reproduce this error, I might try to create a version of the function without these gathers (that might be slower, but better than nothing).

Edit: on second thought, that might just mean the number of dimensions have to be the same. Still seems to be an issue with gather though. I'll see if I can reproduce it.

motorist828 commented 1 year ago

ok, we will wait and believe in you :thumbsup:

YHD233 commented 1 year ago

You'd better use ROCM in linux,this will be much faster than using directML, and the memory management is better

motorist828 commented 1 year ago

i used linux but didn't see any noticeable speed increase, for me it's only 20% faster I stopped using Linux when it became possible to use directml, as it is much more convenient

Aptronymist commented 1 year ago

I've been having the same issue, which is a shame, because as stated, AMD could really use this boost.

Traceback (most recent call last): File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\call_queue.py", line 56, in f res = list(func(*args, kwargs)) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\call_queue.py", line 37, in f res = func(*args, *kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\txt2img.py", line 56, in txt2img processed = process_images(p) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\processing.py", line 504, in process_images res = process_images_inner(p) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\processing.py", line 654, in process_images_inner samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\processing.py", line 870, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\sd_samplers_compvis.py", line 218, in sample samples_ddim = self.launch_sampling(steps, lambda: self.sampler.sample(S=steps, conditioning=conditioning, batch_size=int(x.shape[0]), shape=x[0].shape, verbose=False, unconditional_guidance_scale=p.cfg_scale, unconditional_conditioning=unconditional_conditioning, x_T=x, eta=self.eta)[0]) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\sd_samplers_compvis.py", line 51, in launch_sampling return func() File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\sd_samplers_compvis.py", line 218, in samples_ddim = self.launch_sampling(steps, lambda: self.sampler.sample(S=steps, conditioning=conditioning, batch_size=int(x.shape[0]), shape=x[0].shape, verbose=False, unconditional_guidance_scale=p.cfg_scale, unconditional_conditioning=unconditional_conditioning, x_T=x, eta=self.eta)[0]) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddim.py", line 104, in sample samples, intermediates = self.ddim_sampling(conditioning, size, File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddim.py", line 164, in ddim_sampling outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps, File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\sd_samplers_compvis.py", line 58, in p_sample_ddim_hook res = self.orig_p_sample_ddim(x_dec, cond, ts, unconditional_conditioning=unconditional_conditioning, *args, *kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddim.py", line 212, in p_sample_ddim model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda *args, kwargs: self(*args, *kwargs)) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 28, in call return self.__orig_func(args, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, cond) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward out = self.diffusion_model(x, t, context=cc) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 797, in forward h = module(h, emb, context) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward x = layer(x, context) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 334, in forward x = block(x, context=context[i]) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\extensions\Hypernetwork-MonkeyPatch-Extension\patches\external_pr\sd_hijack_checkpoint.py", line 5, in BasicTransformerBlock_forward return checkpoint(self._forward, x, context) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils\checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils\checkpoint.py", line 107, in forward outputs = run_function(args) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\patch.py", line 48, in _forward m_a, m_c, m_m, u_a, u_c, u_m = compute_merge(x, self._tome_info) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\patch.py", line 21, in compute_merge m, u = merge.bipartite_soft_matching_random2d(x, w, h, args["sx"], args["sy"], r, not args["use_rand"]) File "D:\AI.stablediffusion\stable-diffusion-webui-directml\venv\lib\site-packages\tomesd\merge.py", line 83, in bipartite_soft_matching_random2d dst_idx = node_idx[..., None].gather(dim=-2, index=src_idx) RuntimeError: The parameter is incorrect.

Oh, and for some odd reason, it even happens if I have ToMe unchecked in the UI. As long as the merging ratio is above 0, I get that error.