Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.41k stars 4.96k forks source link

RuntimeError: Input type (c10::Half) and bias type (float) should be the same #155

Open jiho9702 opened 1 year ago

jiho9702 commented 1 year ago

I am using M1 Pro MacBook and I am trying to develop a stablediffusion using mps.

I changed the part about cuda to mps and changed it from ddim.py to float32 because mps did not support float64.

def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device("mps"): attr = attr.to(torch.float32).to(torch.device("mps")) setattr(self, name, attr)

def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True): self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps, num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose) alphas_cumprod = self.model.alphas_cumprod assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep' to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)

Since then, this problem has occurred in conv.py

def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]): if self.padding_mode != 'zeros': return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), weight, bias, self.stride, _pair(0), self.dilation, self.groups) return F.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)

def forward(self, input: Tensor) -> Tensor: return self._conv_forward(input, self.weight, self.bias)

Help me please.

FalseGenius commented 1 year ago

Hey, how is it going? Did you figure it out?

jiho9702 commented 1 year ago

No i didnt ๐Ÿ˜ข

FalseGenius commented 1 year ago

Well, that's too bad. I guess we're stuck then.

jiho9702 commented 1 year ago

Are you having the same problem?

FalseGenius commented 1 year ago

I was having that problem when I altered superresolution.py for my use case. You could try running the pipeline provided by diffusers though.. Since this issue got no response, I switched to running the text2img.py, and now, I am getting new error,

(base) root@stablediffusion$ python scripts/txt2img.py --prompt "a professional photograph ofan astronaut riding a horse" --ckpt v2-1_768-nonema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768--W 768"

"RuntimeError: expected scalar type BFloat16 but found Float"

FalseGenius commented 1 year ago

Wait, you're getting the problem from something else, not the superresolution.py script

FalseGenius commented 1 year ago

Ok, I was fooling around, and got "RuntimeError: Input type (c10::Half) and bias type (float) should be the same" error again. It doesn't show up when you use "cuda". Why are you trying mps instead?

Tps-F commented 1 year ago

You can use this fork supported mps! https://github.com/Tps-F/stablediffusion

jiho9702 commented 1 year ago

it's not work ๐Ÿฅบ

this occurred error

Traceback (most recent call last): File "/Users/blackcat/study/stablediffusion/scripts/txt2img.py", line 393, in main(opt) File "/Users/blackcat/study/stablediffusion/scripts/txt2img.py", line 352, in main samples, _ = sampler.sample(S=opt.steps, File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddim.py", line 107, in sample samples, intermediates = self.ddim_sampling(conditioning, size, File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddim.py", line 167, in ddim_sampling outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps, File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddim.py", line 215, in p_sample_ddim model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, cond) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddpm.py", line 1335, in forward out = self.diffusion_model(x, t, context=cc) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/modules/diffusionmodules/openaimodel.py", line 778, in forward h = module(h, emb, context) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward x = layer(x) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (c10::Half) and bias type (float) should be the same

lakejee-rebel commented 1 year ago

After days of troubleshooting, I was able to resolve this by upgrading tensorflow to 2.11.0 and editing the v2-inference.yaml file's parameter of use_fp16 to False

Tps-F commented 1 year ago

Try to use v2-inference-v-mac.yaml

jiho9702 commented 1 year ago

@lakejee-rebel @Tps-F How long does it take to execute? It takes an hour to create an image on Tps-F's stable diffusion model

Tps-F commented 1 year ago

Here is!

https://github.com/Stability-AI/stablediffusion/pull/163#issuecomment-1422351441

jiho9702 commented 1 year ago

@Tps-F It's faster because I reduced the batch size. Thank you. Are you interested in object detection like ssd(single shot multibox detector) or YOLO? I want trying ssd in m1 Mac but that model used to CUDA how to convert CUDA to MPS?

Tps-F commented 1 year ago

Shall I do it?

jiho9702 commented 1 year ago

I'd appreciate it if you did that.

jiho9702 commented 1 year ago

@Tps-F Can i follow you?

Tps-F commented 1 year ago

Sure! By the way, There seems to be more than one in ssd and YOLO, which one should I support?

Since we are not going to talk here, would you like to go to the discord or something?

jiho9702 commented 1 year ago

Okay good what is your discord id? I will follow you

Tps-F commented 1 year ago

Thank you- Ftps#3389

yyahav commented 1 year ago

@Tps-F Hi, I get a similar error using your fork:

.../venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I used your v2-inference-v-mac.yaml as well, and updated tensorflow to 2.11.0 as suggested but it doesn't work...

Can I connect with you on Discord? I already sent a request... :)

Tps-F commented 1 year ago

I would like to see all the logs and what you have run. Can you show me?

Can I connect with you on Discord? I already sent a request... :)

Sure! But I might as well talk about it here in case anyone encounters a similar error in the future!

tommysnu commented 1 year ago
return F.conv2d(input, weight, bias, self.stride,

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

I also got the above problem @yyahav . I am using ubuntu not macOS

yyahav commented 1 year ago

@tommysnu Can you please share the entire stacktrace? I've made a change in the code which seems to work for me

Tps-F commented 1 year ago

Likewise, please share your logs with us so I can improve.

tommysnu commented 1 year ago

Likewise, please share your logs with us so I can improve.

Traceback (most recent call last):
  File "/mnt/workspace/stablediffusion/scripts/txt2img.py", line 388, in <module>
    main(opt)
  File "/mnt/workspace/stablediffusion/scripts/txt2img.py", line 347, in main
    samples, _ = sampler.sample(S=opt.steps,
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 104, in sample
    samples, intermediates = self.ddim_sampling(conditioning, size,
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 164, in ddim_sampling
    outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 212, in p_sample_ddim
    model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py", line 1335, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py", line 797, in forward
    h = module(h, emb, context)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward
    x = layer(x)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

This is my logs after I run:

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768  

(Link: https://github.com/Stability-AI/stablediffusion#reference-sampling-script)

Could you give me any suggestion @yyahav and @Tps-F ? Thank you so much

Tps-F commented 1 year ago

I know you are using ubuntu, could you try using config for mac? https://github.com/Tps-F/stablediffusion/blob/mps-cpu-support/configs/stable-diffusion/mac/v2-inference-v-mac.yaml

Tps-F commented 1 year ago

I think the reason this happens is because you are using fp16

tommysnu commented 1 year ago

I know you are using ubuntu, could you try using config for mac? https://github.com/Tps-F/stablediffusion/blob/mps-cpu-support/configs/stable-diffusion/mac/v2-inference-v-mac.yaml

Thanks Tps-F. After using this config file I get other error as bellow:

Sampling:   0%|                                                      | 0/3 [00:00<?, ?it/sData shape for DDIM sampling is (3, 4, 96, 96), eta 0.0               | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler:   0%|                                                 | 0/50 [00:00<?, ?it/s]
data:   0%|                                                          | 0/1 [00:02<?, ?it/s]
Sampling:   0%|                                                      | 0/3 [00:02<?, ?it/s]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /mnt/workspace/stablediffusion/scripts/txt2img.py:388 in <module>                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚   385                                                                                   โ”‚
โ”‚   386 if __name__ == "__main__":                                                        โ”‚
โ”‚   387 โ”‚   opt = parse_args()                                                            โ”‚
โ”‚ โฑ 388 โ”‚   main(opt)                                                                     โ”‚
โ”‚   389                                                                                   โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/scripts/txt2img.py:347 in main                           โ”‚
โ”‚                                                                                         โ”‚
โ”‚   344 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   prompts = list(prompts)                                   โ”‚
โ”‚   345 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   c = model.get_learned_conditioning(prompts)                   โ”‚
โ”‚   346 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   shape = [opt.C, opt.H // opt.f, opt.W // opt.f]               โ”‚
โ”‚ โฑ 347 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   samples, _ = sampler.sample(S=opt.steps,                      โ”‚
โ”‚   348 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    conditioning=c,              โ”‚
โ”‚   349 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    batch_size=opt.n_samples,    โ”‚
โ”‚   350 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    shape=shape,                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py โ”‚
โ”‚ :27 in decorate_context                                                                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚    24 โ”‚   โ”‚   @functools.wraps(func)                                                    โ”‚
โ”‚    25 โ”‚   โ”‚   def decorate_context(*args, **kwargs):                                    โ”‚
โ”‚    26 โ”‚   โ”‚   โ”‚   with self.clone():                                                    โ”‚
โ”‚ โฑ  27 โ”‚   โ”‚   โ”‚   โ”‚   return func(*args, **kwargs)                                      โ”‚
โ”‚    28 โ”‚   โ”‚   return cast(F, decorate_context)                                          โ”‚
โ”‚    29 โ”‚                                                                                 โ”‚
โ”‚    30 โ”‚   def _wrap_generator(self, func):                                              โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:104 in sample               โ”‚
โ”‚                                                                                         โ”‚
โ”‚   101 โ”‚   โ”‚   size = (batch_size, C, H, W)                                              โ”‚
โ”‚   102 โ”‚   โ”‚   print(f'Data shape for DDIM sampling is {size}, eta {eta}')               โ”‚
โ”‚   103 โ”‚   โ”‚                                                                             โ”‚
โ”‚ โฑ 104 โ”‚   โ”‚   samples, intermediates = self.ddim_sampling(conditioning, size,           โ”‚
โ”‚   105 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   callback=callback,            โ”‚
โ”‚   106 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   img_callback=img_callback,    โ”‚
โ”‚   107 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   quantize_denoised=quantize_x0 โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py โ”‚
โ”‚ :27 in decorate_context                                                                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚    24 โ”‚   โ”‚   @functools.wraps(func)                                                    โ”‚
โ”‚    25 โ”‚   โ”‚   def decorate_context(*args, **kwargs):                                    โ”‚
โ”‚    26 โ”‚   โ”‚   โ”‚   with self.clone():                                                    โ”‚
โ”‚ โฑ  27 โ”‚   โ”‚   โ”‚   โ”‚   return func(*args, **kwargs)                                      โ”‚
โ”‚    28 โ”‚   โ”‚   return cast(F, decorate_context)                                          โ”‚
โ”‚    29 โ”‚                                                                                 โ”‚
โ”‚    30 โ”‚   def _wrap_generator(self, func):                                              โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:164 in ddim_sampling        โ”‚
โ”‚                                                                                         โ”‚
โ”‚   161 โ”‚   โ”‚   โ”‚   โ”‚   assert len(ucg_schedule) == len(time_range)                       โ”‚
โ”‚   162 โ”‚   โ”‚   โ”‚   โ”‚   unconditional_guidance_scale = ucg_schedule[i]                    โ”‚
โ”‚   163 โ”‚   โ”‚   โ”‚                                                                         โ”‚
โ”‚ โฑ 164 โ”‚   โ”‚   โ”‚   outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_st โ”‚
โ”‚   165 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚     quantize_denoised=quantize_denoised, temper โ”‚
โ”‚   166 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚     noise_dropout=noise_dropout, score_correcto โ”‚
โ”‚   167 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚     corrector_kwargs=corrector_kwargs,          โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py โ”‚
โ”‚ :27 in decorate_context                                                                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚    24 โ”‚   โ”‚   @functools.wraps(func)                                                    โ”‚
โ”‚    25 โ”‚   โ”‚   def decorate_context(*args, **kwargs):                                    โ”‚
โ”‚    26 โ”‚   โ”‚   โ”‚   with self.clone():                                                    โ”‚
โ”‚ โฑ  27 โ”‚   โ”‚   โ”‚   โ”‚   return func(*args, **kwargs)                                      โ”‚
โ”‚    28 โ”‚   โ”‚   return cast(F, decorate_context)                                          โ”‚
โ”‚    29 โ”‚                                                                                 โ”‚
โ”‚    30 โ”‚   def _wrap_generator(self, func):                                              โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:212 in p_sample_ddim        โ”‚
โ”‚                                                                                         โ”‚
โ”‚   209 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   c_in.append(torch.cat([unconditional_conditioning[i], c[i]])) โ”‚
โ”‚   210 โ”‚   โ”‚   โ”‚   else:                                                                 โ”‚
โ”‚   211 โ”‚   โ”‚   โ”‚   โ”‚   c_in = torch.cat([unconditional_conditioning, c])                 โ”‚
โ”‚ โฑ 212 โ”‚   โ”‚   โ”‚   model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chun โ”‚
โ”‚   213 โ”‚   โ”‚   โ”‚   model_output = model_uncond + unconditional_guidance_scale * (model_t โ”‚
โ”‚   214 โ”‚   โ”‚                                                                             โ”‚
โ”‚   215 โ”‚   โ”‚   if self.model.parameterization == "v":                                    โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py:858 in apply_model          โ”‚
โ”‚                                                                                         โ”‚
โ”‚    855 โ”‚   โ”‚   โ”‚   key = 'c_concat' if self.model.conditioning_key == 'concat' else 'c_ โ”‚
โ”‚    856 โ”‚   โ”‚   โ”‚   cond = {key: cond}                                                   โ”‚
โ”‚    857 โ”‚   โ”‚                                                                            โ”‚
โ”‚ โฑ  858 โ”‚   โ”‚   x_recon = self.model(x_noisy, t, **cond)                                 โ”‚
โ”‚    859 โ”‚   โ”‚                                                                            โ”‚
โ”‚    860 โ”‚   โ”‚   if isinstance(x_recon, tuple) and not return_ids:                        โ”‚
โ”‚    861 โ”‚   โ”‚   โ”‚   return x_recon[0]                                                    โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: โ”‚
โ”‚ 1194 in _call_impl                                                                      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1191 โ”‚   โ”‚   # this function, and just call forward.                                  โ”‚
โ”‚   1192 โ”‚   โ”‚   if not (self._backward_hooks or self._forward_hooks or self._forward_pre โ”‚
โ”‚   1193 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):          โ”‚
โ”‚ โฑ 1194 โ”‚   โ”‚   โ”‚   return forward_call(*input, **kwargs)                                โ”‚
โ”‚   1195 โ”‚   โ”‚   # Do not call functions when jit is used                                 โ”‚
โ”‚   1196 โ”‚   โ”‚   full_backward_hooks, non_full_backward_hooks = [], []                    โ”‚
โ”‚   1197 โ”‚   โ”‚   if self._backward_hooks or _global_backward_hooks:                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py:1335 in forward             โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1332 โ”‚   โ”‚   โ”‚   โ”‚   # an error: RuntimeError: forward() is missing value for argumen โ”‚
โ”‚   1333 โ”‚   โ”‚   โ”‚   โ”‚   out = self.scripted_diffusion_model(x, t, cc)                    โ”‚
โ”‚   1334 โ”‚   โ”‚   โ”‚   else:                                                                โ”‚
โ”‚ โฑ 1335 โ”‚   โ”‚   โ”‚   โ”‚   out = self.diffusion_model(x, t, context=cc)                     โ”‚
โ”‚   1336 โ”‚   โ”‚   elif self.conditioning_key == 'hybrid':                                  โ”‚
โ”‚   1337 โ”‚   โ”‚   โ”‚   xc = torch.cat([x] + c_concat, dim=1)                                โ”‚
โ”‚   1338 โ”‚   โ”‚   โ”‚   cc = torch.cat(c_crossattn, 1)                                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: โ”‚
โ”‚ 1194 in _call_impl                                                                      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1191 โ”‚   โ”‚   # this function, and just call forward.                                  โ”‚
โ”‚   1192 โ”‚   โ”‚   if not (self._backward_hooks or self._forward_hooks or self._forward_pre โ”‚
โ”‚   1193 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):          โ”‚
โ”‚ โฑ 1194 โ”‚   โ”‚   โ”‚   return forward_call(*input, **kwargs)                                โ”‚
โ”‚   1195 โ”‚   โ”‚   # Do not call functions when jit is used                                 โ”‚
โ”‚   1196 โ”‚   โ”‚   full_backward_hooks, non_full_backward_hooks = [], []                    โ”‚
โ”‚   1197 โ”‚   โ”‚   if self._backward_hooks or _global_backward_hooks:                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py:797 in       โ”‚
โ”‚ forward                                                                                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚   794 โ”‚   โ”‚                                                                             โ”‚
โ”‚   795 โ”‚   โ”‚   h = x.type(self.dtype)                                                    โ”‚
โ”‚   796 โ”‚   โ”‚   for module in self.input_blocks:                                          โ”‚
โ”‚ โฑ 797 โ”‚   โ”‚   โ”‚   h = module(h, emb, context)                                           โ”‚
โ”‚   798 โ”‚   โ”‚   โ”‚   hs.append(h)                                                          โ”‚
โ”‚   799 โ”‚   โ”‚   h = self.middle_block(h, emb, context)                                    โ”‚
โ”‚   800 โ”‚   โ”‚   for module in self.output_blocks:                                         โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: โ”‚
โ”‚ 1194 in _call_impl                                                                      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1191 โ”‚   โ”‚   # this function, and just call forward.                                  โ”‚
โ”‚   1192 โ”‚   โ”‚   if not (self._backward_hooks or self._forward_hooks or self._forward_pre โ”‚
โ”‚   1193 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):          โ”‚
โ”‚ โฑ 1194 โ”‚   โ”‚   โ”‚   return forward_call(*input, **kwargs)                                โ”‚
โ”‚   1195 โ”‚   โ”‚   # Do not call functions when jit is used                                 โ”‚
โ”‚   1196 โ”‚   โ”‚   full_backward_hooks, non_full_backward_hooks = [], []                    โ”‚
โ”‚   1197 โ”‚   โ”‚   if self._backward_hooks or _global_backward_hooks:                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py:84 in        โ”‚
โ”‚ forward                                                                                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚    81 โ”‚   โ”‚   โ”‚   if isinstance(layer, TimestepBlock):                                  โ”‚
โ”‚    82 โ”‚   โ”‚   โ”‚   โ”‚   x = layer(x, emb)                                                 โ”‚
โ”‚    83 โ”‚   โ”‚   โ”‚   elif isinstance(layer, SpatialTransformer):                           โ”‚
โ”‚ โฑ  84 โ”‚   โ”‚   โ”‚   โ”‚   x = layer(x, context)                                             โ”‚
โ”‚    85 โ”‚   โ”‚   โ”‚   else:                                                                 โ”‚
โ”‚    86 โ”‚   โ”‚   โ”‚   โ”‚   x = layer(x)                                                      โ”‚
โ”‚    87 โ”‚   โ”‚   return x                                                                  โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: โ”‚
โ”‚ 1194 in _call_impl                                                                      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1191 โ”‚   โ”‚   # this function, and just call forward.                                  โ”‚
โ”‚   1192 โ”‚   โ”‚   if not (self._backward_hooks or self._forward_hooks or self._forward_pre โ”‚
โ”‚   1193 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):          โ”‚
โ”‚ โฑ 1194 โ”‚   โ”‚   โ”‚   return forward_call(*input, **kwargs)                                โ”‚
โ”‚   1195 โ”‚   โ”‚   # Do not call functions when jit is used                                 โ”‚
โ”‚   1196 โ”‚   โ”‚   full_backward_hooks, non_full_backward_hooks = [], []                    โ”‚
โ”‚   1197 โ”‚   โ”‚   if self._backward_hooks or _global_backward_hooks:                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/attention.py:334 in forward                  โ”‚
โ”‚                                                                                         โ”‚
โ”‚   331 โ”‚   โ”‚   if self.use_linear:                                                       โ”‚
โ”‚   332 โ”‚   โ”‚   โ”‚   x = self.proj_in(x)                                                   โ”‚
โ”‚   333 โ”‚   โ”‚   for i, block in enumerate(self.transformer_blocks):                       โ”‚
โ”‚ โฑ 334 โ”‚   โ”‚   โ”‚   x = block(x, context=context[i])                                      โ”‚
โ”‚   335 โ”‚   โ”‚   if self.use_linear:                                                       โ”‚
โ”‚   336 โ”‚   โ”‚   โ”‚   x = self.proj_out(x)                                                  โ”‚
โ”‚   337 โ”‚   โ”‚   x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w).contiguous()           โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: โ”‚
โ”‚ 1194 in _call_impl                                                                      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1191 โ”‚   โ”‚   # this function, and just call forward.                                  โ”‚
โ”‚   1192 โ”‚   โ”‚   if not (self._backward_hooks or self._forward_hooks or self._forward_pre โ”‚
โ”‚   1193 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):          โ”‚
โ”‚ โฑ 1194 โ”‚   โ”‚   โ”‚   return forward_call(*input, **kwargs)                                โ”‚
โ”‚   1195 โ”‚   โ”‚   # Do not call functions when jit is used                                 โ”‚
โ”‚   1196 โ”‚   โ”‚   full_backward_hooks, non_full_backward_hooks = [], []                    โ”‚
โ”‚   1197 โ”‚   โ”‚   if self._backward_hooks or _global_backward_hooks:                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/attention.py:269 in forward                  โ”‚
โ”‚                                                                                         โ”‚
โ”‚   266 โ”‚   โ”‚   self.checkpoint = checkpoint                                              โ”‚
โ”‚   267 โ”‚                                                                                 โ”‚
โ”‚   268 โ”‚   def forward(self, x, context=None):                                           โ”‚
โ”‚ โฑ 269 โ”‚   โ”‚   return checkpoint(self._forward, (x, context), self.parameters(), self.ch โ”‚
โ”‚   270 โ”‚                                                                                 โ”‚
โ”‚   271 โ”‚   def _forward(self, x, context=None):                                          โ”‚
โ”‚   272 โ”‚   โ”‚   x = self.attn1(self.norm1(x), context=context if self.disable_self_attn e โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/util.py:121 in checkpoint   โ”‚
โ”‚                                                                                         โ”‚
โ”‚   118 โ”‚   """                                                                           โ”‚
โ”‚   119 โ”‚   if flag:                                                                      โ”‚
โ”‚   120 โ”‚   โ”‚   args = tuple(inputs) + tuple(params)                                      โ”‚
โ”‚ โฑ 121 โ”‚   โ”‚   return CheckpointFunction.apply(func, len(inputs), *args)                 โ”‚
โ”‚   122 โ”‚   else:                                                                         โ”‚
โ”‚   123 โ”‚   โ”‚   return func(*inputs)                                                      โ”‚
โ”‚   124                                                                                   โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/util.py:136 in forward      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   133 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      "dtype": torch.get_autocast_gpu_dtype(),       โ”‚
โ”‚   134 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      "cache_enabled": torch.is_autocast_cache_enabl โ”‚
โ”‚   135 โ”‚   โ”‚   with torch.no_grad():                                                     โ”‚
โ”‚ โฑ 136 โ”‚   โ”‚   โ”‚   output_tensors = ctx.run_function(*ctx.input_tensors)                 โ”‚
โ”‚   137 โ”‚   โ”‚   return output_tensors                                                     โ”‚
โ”‚   138 โ”‚                                                                                 โ”‚
โ”‚   139 โ”‚   @staticmethod                                                                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/attention.py:272 in _forward                 โ”‚
โ”‚                                                                                         โ”‚
โ”‚   269 โ”‚   โ”‚   return checkpoint(self._forward, (x, context), self.parameters(), self.ch โ”‚
โ”‚   270 โ”‚                                                                                 โ”‚
โ”‚   271 โ”‚   def _forward(self, x, context=None):                                          โ”‚
โ”‚ โฑ 272 โ”‚   โ”‚   x = self.attn1(self.norm1(x), context=context if self.disable_self_attn e โ”‚
โ”‚   273 โ”‚   โ”‚   x = self.attn2(self.norm2(x), context=context) + x                        โ”‚
โ”‚   274 โ”‚   โ”‚   x = self.ff(self.norm3(x)) + x                                            โ”‚
โ”‚   275 โ”‚   โ”‚   return x                                                                  โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: โ”‚
โ”‚ 1194 in _call_impl                                                                      โ”‚
โ”‚                                                                                         โ”‚
โ”‚   1191 โ”‚   โ”‚   # this function, and just call forward.                                  โ”‚
โ”‚   1192 โ”‚   โ”‚   if not (self._backward_hooks or self._forward_hooks or self._forward_pre โ”‚
โ”‚   1193 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):          โ”‚
โ”‚ โฑ 1194 โ”‚   โ”‚   โ”‚   return forward_call(*input, **kwargs)                                โ”‚
โ”‚   1195 โ”‚   โ”‚   # Do not call functions when jit is used                                 โ”‚
โ”‚   1196 โ”‚   โ”‚   full_backward_hooks, non_full_backward_hooks = [], []                    โ”‚
โ”‚   1197 โ”‚   โ”‚   if self._backward_hooks or _global_backward_hooks:                       โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/stablediffusion/ldm/modules/attention.py:233 in forward                  โ”‚
โ”‚                                                                                         โ”‚
โ”‚   230 โ”‚   โ”‚   )                                                                         โ”‚
โ”‚   231 โ”‚   โ”‚                                                                             โ”‚
โ”‚   232 โ”‚   โ”‚   # actually compute the attention, what we cannot get enough of            โ”‚
โ”‚ โฑ 233 โ”‚   โ”‚   out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op โ”‚
โ”‚   234 โ”‚   โ”‚                                                                             โ”‚
โ”‚   235 โ”‚   โ”‚   if exists(mask):                                                          โ”‚
โ”‚   236 โ”‚   โ”‚   โ”‚   raise NotImplementedError                                             โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:192 in memory_efficient_attention โ”‚
โ”‚                                                                                         โ”‚
โ”‚   189 โ”‚   โ”‚   and options.                                                              โ”‚
โ”‚   190 โ”‚   :return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]``            โ”‚
โ”‚   191 โ”‚   """                                                                           โ”‚
โ”‚ โฑ 192 โ”‚   return _memory_efficient_attention(                                           โ”‚
โ”‚   193 โ”‚   โ”‚   Inputs(                                                                   โ”‚
โ”‚   194 โ”‚   โ”‚   โ”‚   query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=sc โ”‚
โ”‚   195 โ”‚   โ”‚   ),                                                                        โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:290 in                            โ”‚
โ”‚ _memory_efficient_attention                                                             โ”‚
โ”‚                                                                                         โ”‚
โ”‚   287 ) -> torch.Tensor:                                                                โ”‚
โ”‚   288 โ”‚   # fast-path that doesn't require computing the logsumexp for backward computa โ”‚
โ”‚   289 โ”‚   if all(x.requires_grad is False for x in [inp.query, inp.key, inp.value]):    โ”‚
โ”‚ โฑ 290 โ”‚   โ”‚   return _memory_efficient_attention_forward(                               โ”‚
โ”‚   291 โ”‚   โ”‚   โ”‚   inp, op=op[0] if op is not None else None                             โ”‚
โ”‚   292 โ”‚   โ”‚   )                                                                         โ”‚
โ”‚   293                                                                                   โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:306 in                            โ”‚
โ”‚ _memory_efficient_attention_forward                                                     โ”‚
โ”‚                                                                                         โ”‚
โ”‚   303 โ”‚   inp.validate_inputs()                                                         โ”‚
โ”‚   304 โ”‚   output_shape = inp.normalize_bmhk()                                           โ”‚
โ”‚   305 โ”‚   if op is None:                                                                โ”‚
โ”‚ โฑ 306 โ”‚   โ”‚   op = _dispatch_fw(inp)                                                    โ”‚
โ”‚   307 โ”‚   else:                                                                         โ”‚
โ”‚   308 โ”‚   โ”‚   _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op โ”‚
โ”‚   309                                                                                   โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/xformers/xformers/ops/fmha/dispatch.py:98 in _dispatch_fw                โ”‚
โ”‚                                                                                         โ”‚
โ”‚    95 โ”‚   if _is_triton_fwd_fastest(inp):                                               โ”‚
โ”‚    96 โ”‚   โ”‚   priority_list_ops.remove(triton.FwOp)                                     โ”‚
โ”‚    97 โ”‚   โ”‚   priority_list_ops.insert(0, triton.FwOp)                                  โ”‚
โ”‚ โฑ  98 โ”‚   return _run_priority_list(                                                    โ”‚
โ”‚    99 โ”‚   โ”‚   "memory_efficient_attention_forward", priority_list_ops, inp              โ”‚
โ”‚   100 โ”‚   )                                                                             โ”‚
โ”‚   101                                                                                   โ”‚
โ”‚                                                                                         โ”‚
โ”‚ /mnt/workspace/xformers/xformers/ops/fmha/dispatch.py:73 in _run_priority_list          โ”‚
โ”‚                                                                                         โ”‚
โ”‚    70 {textwrap.indent(_format_inputs_description(inp), '     ')}"""                    โ”‚
โ”‚    71 โ”‚   for op, not_supported in zip(priority_list, not_supported_reasons):           โ”‚
โ”‚    72 โ”‚   โ”‚   msg += "\n" + _format_not_supported_reasons(op, not_supported)            โ”‚
โ”‚ โฑ  73 โ”‚   raise NotImplementedError(msg)                                                โ”‚
โ”‚    74                                                                                   โ”‚
โ”‚    75                                                                                   โ”‚
โ”‚    76 def _dispatch_fw(inp: Inputs) -> Type[AttentionFwOpBase]:                         โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
NotImplementedError: No operator found for `memory_efficient_attention_forward` with 
inputs:
     query       : shape=(30, 9216, 1, 64) (torch.float32)
     key         : shape=(30, 9216, 1, 64) (torch.float32)
     value       : shape=(30, 9216, 1, 64) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`flshattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 64
Tps-F commented 1 year ago

Could you try to remove --xformers flag?

tommysnu commented 1 year ago
python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt checkpoints/v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-mac.yaml --H 768 --W 768 --precision full

I dont use --xformers

Could you try to remove --xformers flag?

Tps-F commented 1 year ago

I think that error is caused by xformers, so I'd like you to try uninstalling it

zdxpan commented 1 year ago

when use diffuser pipeline , i meet this,

my solution is :

pipe.to("cuda", torch.float16) try : variable.to(device, torch_type) and then it run no problem

Helvio88 commented 1 year ago

I receive the same error when using txt2img on webui. Running on Docker without venv (see Dockerfile at https://github.com/helv-io/stable-diffusion-webui). It's an untainted build.

Running on a 8GB Tesla P4. Generation works, but in the webui, I lose the progress bar.

Any advice? I tried changing use_fp16 to False, but same thing.

addiserp commented 1 year ago

im using ubuntu transformers==4.25.1 timm==0.5.4

& i change pretrained_model.encoder.to(torch.bfloat16) to pretrained_model.encoder.to(torch.float) now it works for me

nktice commented 1 year ago

I was getting these, and eventually deduced what was going on. These errors appeared during the render process when it'd show updates. Instead of updates, these errors were appearing... I did some research. I had been moving the default models folder and using my own. The folder I had didn't have various files it was looking for. It turns out I needed to merge in the default files to get it working...

Here for example is what I typed in...

cd stable-diffusion-webui
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git SD-hold
cp -r SD-hold/models/* models/
rm -rf SD-hold

That restores the files to the models folder tree, and solved this for me.

Helvio88 commented 1 year ago

I was getting these, and eventually deduced what was going on. These errors appeared during the render process when it'd show updates. Instead of updates, these errors were appearing... I did some research. I had been moving the default models folder and using my own. The folder I had didn't have various files it was looking for. It turns out I needed to merge in the default files to get it working...

Here for example is what I typed in...

cd stable-diffusion-webui
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git SD-hold
cp -r SD-hold/models/* models/
rm -rf SD-hold

That restores the files to the models folder tree, and solved this for me.

This fixed, it, thanks! Perhaps those required files could be moved somewhere else, leaving the models folder empty for user-managed models? This would improve containerization.

12321555 commented 1 year ago

i use yolov5 ๏ผŒwhen i put ACmix block into my code leading the same mistake,someone can help me?please! File "c:/Users/xie/Desktop/yolov5-master/train.py", line 541, in main train(opt.hyp, opt, device, callbacks) File "c:/Users/xie/Desktop/yolov5-master/train.py", line 374, in train compute_loss=compute_loss) File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "c:\Users\xie\Desktop\yolov5-master\val.py", line 210, in run preds, train_out = model(im) if compute_loss else (model(im, augment=augment), None) File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "c:\Users\xie\Desktop\yolov5-master\models\yolo.py", line 209, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "c:\Users\xie\Desktop\yolov5-master\models\yolo.py", line 121, in _forward_once x = m(x) # run File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "c:\Users\xie\Desktop\yolov5-master\models\common.py", line 947, in forward pe = self.conv_p(position(h, w, x.is_cuda)) File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\conv.py", line 467, in forward return self._conv_forward(input, self.weight, self.bias) File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\conv.py", line 464, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

timbr0wn commented 1 year ago

I got a similar RuntimeError when using SDXL in the AUTOMATIC1111 interface. The --no-half-vae option fixed it for me.

aiGeneratedUser commented 1 year ago

I believe there is a VRAM memory leak which happens if the model couldn't be upscaled. It causes VRAM to never release the acquired space (unless you restart webui). If you happened to see your image couldn't be upscaled - it means your VRAM is now leaked. In that case, you should restart the whole webui to be able to utilize it properly. My specs: GPU: Nvidia Quadro RTX 5000 GPU (16 GB VRAM). OS: Fedora 38.

Acquired VRAM before webui is started:

$ nvidia-smi 
Sat Aug 12 01:26:05 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 5000                Off | 00000000:01:00.0  On |                  N/A |
| N/A   55C    P5              10W /  90W |    863MiB / 16384MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2998      G   /usr/bin/gnome-shell                        463MiB |
|    0   N/A  N/A      7491      G   /usr/bin/Xwayland                             7MiB |
|    0   N/A  N/A     20394      G   /usr/bin/nautilus                            84MiB |
|    0   N/A  N/A     35817      G   /usr/lib64/firefox/firefox                  301MiB |
+---------------------------------------------------------------------------------------+

Acquired VRAM after webui is fully started, but before the first image generation job is triggered. I see 7 GB of VRAM is now acquired by webui even so there were no any image generation tasks submitted. Not sure if that's expected or not. It could be that webui is loading the checkpoint model into VRAM on the start, but I'm not sure.:

$ nvidia-smi 
Sat Aug 12 01:38:02 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 5000                Off | 00000000:01:00.0  On |                  N/A |
| N/A   55C    P8               4W /  90W |   8326MiB / 16384MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2998      G   /usr/bin/gnome-shell                        462MiB |
|    0   N/A  N/A      7491      G   /usr/bin/Xwayland                             9MiB |
|    0   N/A  N/A     20394      G   /usr/bin/nautilus                            80MiB |
|    0   N/A  N/A     35817      G   /usr/lib64/firefox/firefox                  221MiB |
|    0   N/A  N/A     63368      C   python3.10                                 7396MiB |
|    0   N/A  N/A     63662      C   ...diffusion-webui/venv/bin/python3.10      116MiB |
+---------------------------------------------------------------------------------------+

Acquired VRAM after webui is the first image generation job is started, but failed to be upscaled. We see that after the image generation task is stopped and not running anymore, acquired VRAM is constantly staying at 14 GB. Thus, no new image generation jobs can be submitted:

$ nvidia-smi 
Sat Aug 12 01:40:05 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 5000                Off | 00000000:01:00.0  On |                  N/A |
| N/A   62C    P8              10W /  90W |  15318MiB / 16384MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2998      G   /usr/bin/gnome-shell                        450MiB |
|    0   N/A  N/A      7491      G   /usr/bin/Xwayland                             9MiB |
|    0   N/A  N/A     20394      G   /usr/bin/nautilus                            86MiB |
|    0   N/A  N/A     35817      G   /usr/lib64/firefox/firefox                  238MiB |
|    0   N/A  N/A     63368      C   python3.10                                14376MiB |
|    0   N/A  N/A     63662      C   ...diffusion-webui/venv/bin/python3.10      116MiB |
+---------------------------------------------------------------------------------------+

Acquired VRAM after webui is stopped. We see if we stop webui - VRAM is fully releasedm and can be reused:

$ nvidia-smi 
Sat Aug 12 01:47:15 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 5000                Off | 00000000:01:00.0  On |                  N/A |
| N/A   53C    P8               3W /  90W |    859MiB / 16384MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2998      G   /usr/bin/gnome-shell                        454MiB |
|    0   N/A  N/A      7491      G   /usr/bin/Xwayland                             9MiB |
|    0   N/A  N/A     20394      G   /usr/bin/nautilus                            86MiB |
|    0   N/A  N/A     35817      G   /usr/lib64/firefox/firefox                  252MiB |
+---------------------------------------------------------------------------------------+

Conclusion: It seems there is some kind of bug which doesn't release VRAM on failures. I'm also not sure why 7 GB of VRAM is acquired during initial application startup. I'm also not sure why another 7 GB of VRAM is acquired after the image generation process (total 14 GB of VRAM). A simple solution for now: just stop and start webui again. It will release all VRAM and you will be able to generate images again.

Xav-Pe commented 1 year ago

Same error here, it's just the opposite : RuntimeError: Input type (float) and bias type ([float](c10::Half)) should be the same].

Only when I use Hires fix, any model, any upscaler.

That's not a VRAM leak because it does it every time I try, even after computer restart.

My command line : COMMANDLINE_ARGS=--upcast-sampling --medvram --xformers --no-half-vae

homomorfism commented 1 year ago

I have changed torch from 1.13.1 to 2.0.1 and problem dissapeared!

My environment.yml:

name: stable-diffusion-env
channels:
  - defaults
dependencies:
  - python=3.8  # Change to your desired Python version
  - pip
  - pip:
    - ./invisible-watermark
    - torch==2.0.1
    - safetensors==0.3.3
    - accelerate==0.22.0
    - diffusers==0.20.2
    - transformers==4.33.1
nktice commented 1 year ago

This issue still stands with AMD GPUs so I tried elsewhere... Wrote to ExLama they say the issue is HIP ( or perhaps ROCm ) : https://github.com/turboderp/exllama/issues/281 So here is my post on the Github HIP project in hopes they'll resolve : https://github.com/ROCm-Developer-Tools/HIP/issues/3331

piotrgraczyk commented 1 year ago

I had fp16 error (fixed by using --precision full and disabling fp16) Then error mentioned in this thread appeared inserting the line of - torch==2.0.1 did not help I checked anaconda and I have 2.0.1 installed I have intel HD4000 on hp laptop and windows 8.1. any ideas anybody ? shall I try to force v2-inference.yaml instead of v2-inference-v one next ? if so. how do I do that ?

masterLazy commented 11 months ago

I got a similar error, but I was running it on Windows 10 instead of Mac: RuntimeError: Input type (struct c10::Half) and bias type (float) should be the same I tried removing --xformers from the command line, adding --no-half, --no-half-vae, re-installing torch and restarting the computer, but they didn't work. Is there anybody can help? Thanks.

The command line: --medvram --xformers --opt-split-attention --disable-nan-check The whole log:

  File "H:\AI\sd-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "H:\AI\sd-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "H:\AI\sd-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "H:\AI\sd-webui\modules\processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "H:\AI\sd-webui\modules\processing.py", line 653, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "H:\AI\sd-webui\modules\processing.py", line 869, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "H:\AI\sd-webui\modules\sd_samplers_kdiffusion.py", line 358, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "H:\AI\sd-webui\modules\sd_samplers_kdiffusion.py", line 234, in launch_sampling
    return func()
  File "H:\AI\sd-webui\modules\sd_samplers_kdiffusion.py", line 358, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "H:\AI\sd-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "H:\AI\sd-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 553, in sample_dpmpp_sde
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "H:\AI\sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "H:\AI\sd-webui\modules\sd_samplers_kdiffusion.py", line 155, in forward
    sd_samplers_common.store_latent(x_out[0:uncond.shape[0]])
  File "H:\AI\sd-webui\modules\sd_samplers_common.py", line 58, in store_latent
    shared.state.assign_current_image(sample_to_image(decoded))
  File "H:\AI\sd-webui\modules\sd_samplers_common.py", line 46, in sample_to_image
    return single_sample_to_image(samples[index], approximation)
  File "H:\AI\sd-webui\modules\sd_samplers_common.py", line 35, in single_sample_to_image
    x_sample = sd_vae_approx.model()(sample.to(devices.device, devices.dtype).unsqueeze(0))[0].detach()
  File "H:\AI\sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "H:\AI\sd-webui\modules\sd_vae_approx.py", line 28, in forward
    x = layer(x)
  File "H:\AI\sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "H:\AI\sd-webui\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward
    return torch.nn.Conv2d_forward_before_lora(self, input)
  File "H:\AI\sd-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "H:\AI\sd-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (struct c10::Half) and bias type (float) should be the same

I think it's a wonder error because it ran successfully before, but today something goes wrong.

masterLazy commented 11 months ago

I solved my problem. It went wrong because I delete the model.pt in VAE-approx, and I restore it and the problem was solved. When I tried generating an image first time, it says "No such file or directory: '...\\models\\VAE-approx\\model.pt'" but I ignored it and tried generating again. Then it seems it changed to another error, which is like my last comment. But in fact that's because the file hasn't loaded yet. So if somebody receives the same error, try checking if there's any "different" error informations before. That might be the ture reason for the error.

woodgoblin commented 9 months ago

Same happened for me today python scripts/txt2img.py --prompt "capybara" --ckpt .\checkpoints\512-base-ema.ckpt --n_samples 1 --precision=full --device=cpu --outdir ..\output

RuntimeError: Input type (struct c10::Half) and bias type (float) should be the same

giselle197 commented 9 months ago

I didn't use this project, however, when I use HuggingFace and THUDM/chatglm2-6b-int4,

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True)

I encounter the similar error

File "/home/username/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/quantization.py", line 248, in forward
    output = inp.mm(weight.t())
RuntimeError: expected m1 and m2 to have the same dtype, but got: c10::Half != float

Then I tried and found the solution model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True).cuda()

This may be somewhat related.

hwk06023 commented 7 months ago
Traceback (most recent call last):                                                 
  File "train_inpainting_dreambooth.py", line 876, in <module>
    main(args)
  File "train_inpainting_dreambooth.py", line 859, in main
    save_weights(global_step)
  File "train_inpainting_dreambooth.py", line 758, in save_weights
    images = pipeline(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 818, in __call__
    mask, masked_image_latents = self.prepare_mask_latents(
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 597, in prepare_mask_latents
    masked_image_latents = self.vae.encode(masked_image).latent_dist.sample(generator=generator)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/autoencoder_kl.py", line 164, in encode
    h = self.encoder(x)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py", line 109, in forward
    sample = self.conv_in(sample)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
Steps:  40%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Š                   | 6000/15000 [1:16:40<1:55:00,  1.30it/s, loss=0.107, lr=2e-6]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/dblab/.local/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/dblab/.local/lib/python3.8/site-packages/accelerate/commands/launch.py", line 923, in launch_command
    simple_launcher(args)
  File "/home/dblab/.local/lib/python3.8/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_inpainting_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-inpainting', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--output_dir=../../../../Realstc_Vision_profile_hyunwoo5_inpaint', '--with_prior_preservation', '--prior_loss_weight=1.0', '--seed=3434554', '--resolution=512', '--train_batch_size=2', '--train_text_encoder', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=51', '--sample_batch_size=4', '--max_train_steps=15000', '--save_interval=1000', '--save_min_steps=6000', '--save_infer_steps=35', '--concepts_list=concepts_list.json', '--not_cache_latents', '--hflip']' returned non-zero exit status 1.

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

This issue occurs in min_steps(6000step) while running well. It seems to be a problem when saving.

Remove '--mixed_precision=fp16' and Changing torch=2.0.2 are not useful for my case. And When I try to change Input type in python3.8/dist-packages/torch/nn/modules/conv.py, I received RuntimeError: Input type (float) and bias type (c10::Half) should be the same.

Is it Schrรถdinger's code?

Help me plz ๐Ÿฅบ

f0ster commented 7 months ago

fwiw I encountered this while using a1111, and restarting fixed it