AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.54k stars 26.75k forks source link

[Bug]: RuntimeError: "upsample_nearest2d_channels_last" not implemented for 'Half' #12778

Open Karsten385 opened 1 year ago

Karsten385 commented 1 year ago

Is there an existing issue for this?

What happened?

Cannot generate images. Not sure what changed since a few days ago when it was running fine, but here is the error I generated. RuntimeError: "upsample_nearest2d_channels_last" not implemented for 'Half'

Image tries to generate for like 2 seconds before pulling this error.

Steps to reproduce the problem

  1. Add prompt
  2. attempt to generate image
  3. Image does not generate

What should have happened?

...Image should generate.

Version or Commit where the problem happens

v1.5.2, commit hash c9c8485bc1e8720aba70f029d25cba1c4abf2b5c

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

MacOS

What device are you running WebUI on?

CPU

Cross attention optimization

Automatic

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

No

List of extensions

a1111-sd-webui-lycoris

stable-diffusion-webui-images-browser

Console logs

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 4.3s (launcher: 0.2s, import torch: 1.3s, import gradio: 0.6s, setup paths: 0.3s, other imports: 0.3s, list SD models: 0.1s, load scripts: 0.4s, create ui: 0.8s).
Applying attention optimization: InvokeAI... done.
Model loaded in 5.1s (load weights from disk: 0.6s, create model: 0.8s, apply weights to model: 1.4s, apply half(): 1.5s, move model to device: 0.6s, calculate empty prompt: 0.1s).
  0%|                                                    | 0/77 [00:01<?, ?it/s]
*** Error completing request
*** Arguments: ('task(cqwpl7x78uin03e)', '(masterpiece:1.2, best quality:1.2, high quality, highres:1.1, high contrast, vibrant colors), (unique character design), extremely detailed, (cast shadows), 4K, perfect eyes, perfect face, ((adult school teacher)), (pencil skirt), (button-down shirt),  ((strained buttons)), (straight blonde hair),\n\n', 'easynegative, (bad character design:1.6),  ((nipples)), bad quality, worst quality, blur, bad anatomy, extra limbs, loli, child, watermark, logo, signature,  ng_deepnegative_v1_75t, ((blurry)),  (no arms),  squashed face, deformed face, bizarre facial features, unnatural facial features, (too many arms), ((multiple people))', [], 77, 15, False, False, 1, 1, 8.5, 2431381970.0, -1.0, 0, 0, 0, False, 728, 528, False, 0.7, 2, 'Latent', 0, 0, 0, 0, '', '', [], <gradio.routes.Request object at 0x2930fead0>, 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
    Traceback (most recent call last):
      File "/Users/karsten/stable-diffusion-webui/modules/call_queue.py", line 58, in f
        res = list(func(*args, **kwargs))
      File "/Users/karsten/stable-diffusion-webui/modules/call_queue.py", line 37, in f
        res = func(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/modules/txt2img.py", line 62, in txt2img
        processed = processing.process_images(p)
      File "/Users/karsten/stable-diffusion-webui/modules/processing.py", line 677, in process_images
        res = process_images_inner(p)
      File "/Users/karsten/stable-diffusion-webui/modules/processing.py", line 794, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "/Users/karsten/stable-diffusion-webui/modules/processing.py", line 1054, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "/Users/karsten/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 464, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
      File "/Users/karsten/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 303, in launch_sampling
        return func()
      File "/Users/karsten/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 464, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 202, in forward
        x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 114, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 140, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "/Users/karsten/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/modules/sd_hijack_unet.py", line 48, in apply_model
        return orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).float()
      File "/Users/karsten/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/modules/sd_unet.py", line 91, in UNetModel_forward
        return ldm.modules.diffusionmodules.openaimodel.copy_of_UNetModel_forward_for_webui(self, x, timesteps, context, *args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 802, in forward
        h = module(h, emb, context)
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward
        x = layer(x)
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/karsten/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 115, in forward
        x = F.interpolate(x, scale_factor=2, mode="nearest")
      File "/Users/karsten/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 3931, in interpolate
        return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
    RuntimeError: "upsample_nearest2d_channels_last" not implemented for 'Half'

Additional information

No response

brsh1 commented 1 year ago

same happens here. intel mac

remixer-dec commented 1 year ago

A workaround is to use f32 for specific operations:

repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py

--- a/ldm/modules/diffusionmodules/openaimodel.py
+++ b/ldm/modules/diffusionmodules/openaimodel.py
@@ -112,7 +112,7 @@ class Upsample(nn.Module):
                 x, (x.shape[2], x.shape[3] * 2, x.shape[4] * 2), mode="nearest"
             )
         else:
-            x = F.interpolate(x, scale_factor=2, mode="nearest")
+            x = F.interpolate(x.to(th.float32), scale_factor=2, mode="nearest").to(x.dtype)
         if self.use_conv:
             x = self.conv(x)
         return x

after this I got another error: RuntimeError: "compute_indices_weights_nearest" not implemented for 'Half' and fixed it in a similar way

modules/sd_vae_approx.py

--- a/modules/sd_vae_approx.py
+++ b/modules/sd_vae_approx.py
@@ -21,7 +21,7 @@ class VAEApprox(nn.Module):

     def forward(self, x):
         extra = 11
-        x = nn.functional.interpolate(x, (x.shape[2] * 2, x.shape[3] * 2))
+        x = nn.functional.interpolate(x.to(torch.float32), (x.shape[2] * 2, x.shape[3] * 2)).to(x.dtype)
         x = nn.functional.pad(x, (extra, extra, extra, extra))

         for layer in [self.conv1, self.conv2, self.conv3, self.conv4, self.conv5, self.conv6, self.conv7, self.conv8, ]:

Some people at #8555 suggested using --no-half but it switches the entire codebase to float32, consuming twice as much RAM.