bmaltais / kohya_ss

Apache License 2.0
8.8k stars 1.14k forks source link

Cannot train SDXL DoRa #2369

Open ejektaflex opened 2 months ago

ejektaflex commented 2 months ago

When trying to train an SDXL DoRa, I get an error. Here is my TOML configuration, from the tmp file created when running:

tmpfilelora.txt

And the error stack trace:

File "C:\Apps\AIArt\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Apps\AIArt\kohya_ss\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
    input = module(input)
  File "C:\Apps\AIArt\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Apps\AIArt\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Apps\AIArt\kohya_ss\venv\lib\site-packages\lycoris\modules\locon.py", line 246, in forward
    weight = self.apply_weight_decompose(weight)
  File "C:\Apps\AIArt\kohya_ss\venv\lib\site-packages\lycoris\modules\locon.py", line 207, in apply_weight_decompose
    return weight.cuda() * (self.dora_scale / weight_norm)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
steps:   0%|                                                                 | 0/2000 [00:01<?, ?it/s]
bmaltais commented 2 months ago

Not sure there is much I can do... If you know this work when using sd-scripts directly I could compare the parameters you use vs what the GUI produce... but this might be a LyCORIS issue and not something the GUI can fix... The traceback clearly point to lycoris as the source of the issue.

You might have better help by opening an issue directly on the lycoris github repo.

ejektaflex commented 2 months ago

I'm not sure if the sdxl and sd1.5 scripts for lycoris are different, but it was notable that training with a 1.5 base model instead did work. Pretty much all of the rest of the parameters were the same, as far as I can recall. That was why I thought that perhaps it was an issue with this project instead.

avan06 commented 1 month ago

This issue might be related to the LyCORIS package. You can refer to the following issue:

https://github.com/KohakuBlueleaf/LyCORIS/issues/160