d8ahazard / sd_dreambooth_extension

Other
1.86k stars 281 forks source link

[Bug]: [b4053de] RuntimeError: mat1 and mat2 shapes cannot be multiplied (20480x64 and 320x320) #1334

Closed ignis-sec closed 9 months ago

ignis-sec commented 1 year ago

Is there an existing issue for this?

What happened?

With the new update, Dreambooth extension is unable to train LoRA extended models.

Looks like commit b4053de has broken as LoRA Extended training as diffusers 0.19.3 does not work with LoRA extended training.

Steps to reproduce the problem

  1. Go to training section.
  2. Create a new model.
  3. Select LoRA, and LoRA extended.
  4. Set the configuration parameters and start training.

Commit and libraries

Dreambooth revision: 72ba64f9b6954f7f9fd2a4181ae8b88a715e1a28 [+] xformers version 0.0.21 installed. [+] torch version 2.0.1 installed. [+] torchvision version 0.15.2+cu117 installed. [+] accelerate version 0.21.0 installed. [+] diffusers version 0.19.3 installed. [+] transformers version 4.30.2 installed. [+] bitsandbytes version 0.35.4 installed.

Command Line Arguments

--medvram --xformers --skip-install

Console logs

Traceback (most recent call last):
  File "/home/ignis/sd/extensions/sd_dreambooth_extension/dreambooth/ui_functions.py", line 729, in start_training
    result = main(class_gen_method=class_gen_method)
  File "/home/ignis/sd/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 1552, in main
    return inner_loop()
  File "/home/ignis/sd/extensions/sd_dreambooth_extension/dreambooth/memory.py", line 119, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "/home/ignis/sd/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 1286, in inner_loop
    noise_pred = unet(
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
    return model_forward(*args, **kwargs)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 915, in forward
    sample, res_samples = downsample_block(
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 995, in forward
    hidden_states = resnet(hidden_states, temb)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/diffusers/models/resnet.py", line 612, in forward
    hidden_states = self.conv1(hidden_states)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ignis/sd/extensions/sd_dreambooth_extension/lora_diffusion/lora.py", line 34, in forward
    self.linear(input)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ignis/sd/extensions/a1111-sd-webui-locon/scripts/../../../extensions-builtin/Lora/networks.py", line 361, in network_Linear_forward
    return torch.nn.Linear_forward_before_network(self, input)
  File "/home/ignis/sd/extensions/a1111-sd-webui-lycoris/lycoris.py", line 741, in lyco_Linear_forward
    return torch.nn.Linear_forward_before_lyco(self, input)
  File "/home/ignis/sd/venv_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (93440x55 and 320x320)

Additional information

No response

keithjpaulson commented 1 year ago

running "pip install diffusers==0.10.2" (technically "%PYTHION% -mpip install diffusers=0.10.2" which sd was running) seems to be a workaround. Not sure if other diffuser versions worked, that's just the version that was used as a workaround last time an error like this happened.

edit: ref: https://github.com/d8ahazard/sd_dreambooth_extension/issues/826

thecakeisal1e commented 1 year ago

I have the same issue but only for Lora Extended training. Regular Lora training might work (I'm checking right now) but the sample images are noise. Checkpoint training works as intended. Diffusers 0.19.3 works and is probably not the problem. Reversion to 0.10.2 doesn't work.

ignis-sec commented 1 year ago

I've not tried it with different diffusers, considering the final comment on #826. Regular Lora training will work, only extended network training is broken. @thecakeisal1e Noise in Lora training sample images is a long standing issue at #1273 for which I've just sent a pull request with the fix at #1336

Val-MG commented 1 year ago

Hi, any news on a solution for this issue ? I'm using dreqmbooth version [cf086c53] and A1111 version 1.5.2 and I can't even get a Lora to train from scratch, always returning this same error message.

ignis-sec commented 1 year ago

@Val-MG unfortunately lora training is still broken, c2a5617c587b812b5a408143ddfb18fc49234edf is the latest version that works

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

levicki commented 11 months ago

@d8ahazard I am getting this same error on the latest version when I try to train but only for Extended LoRA.

Kangavallo commented 11 months ago

I am getting the same error. Any ideas on this one?

levicki commented 11 months ago

@d8ahazard Could someone please try to reproduce this issue with extended LoRA training with the latest updates?

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

levicki commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

Whatever.

Kangavallo commented 11 months ago

@d8ahazard Could someone please try to reproduce this issue with extended LoRA training with the latest updates?

I'm still getting the sane issue with the latest version

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

Kangavallo commented 10 months ago

Any progress?

github-actions[bot] commented 9 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

d8ahazard commented 9 months ago

This should be fixed in the latest release. Please open a new issue if you're still having problems. Thanks! :D