bmaltais / kohya_ss

Apache License 2.0
8.79k stars 1.14k forks source link

Cannot start DoRa training due to error #2589

Open mi8m opened 3 weeks ago

mi8m commented 3 weeks ago

My main settings (https://pastebin.com/0BMs5ft8) work fine, but while using them, I tried changing the network module to lycoris-locon with dora activated, but I got an error.

/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( Traceback (most recent call last): File "/workspace/kohya_ss/sd-scripts/sdxl_train_network.py", line 185, in trainer.train(args) File "/workspace/kohya_ss/sd-scripts/train_network.py", line 864, in train noise_pred = self.call_unet( File "/workspace/kohya_ss/sd-scripts/sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 680, in forward return model_forward(args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 668, in call return convert_to_fp32(self.model_forward(*args, kwargs)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 1093, in call_module x = layer(x, emb) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 348, in forward x = torch.utils.checkpoint.checkpoint(create_custom_forward(self.forward_body), x, emb, use_reentrant=USE_REENTRANT) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, *kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward outputs = run_function(args) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 344, in custom_forward return func(inputs) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 331, in forward_body h = self.in_layers(x) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/lycoris/modules/locon.py", line 246, in forward weight = self.apply_weight_decompose(weight) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/lycoris/modules/locon.py", line 207, in apply_weight_decompose return weight (self.dora_scale / weight_norm) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! steps: 0%| | 0/3525 [00:00<?, ?it/s] Traceback (most recent call last): File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', '/workspace/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', '/workspace/stuff/output/config_lora-20240613-115427.toml', '--network_train_unet_only', '--keep_tokens_separator', "'|||'", '--base_weights']' returned non-zero exit status 1.

so I just refreshed the page to get the default settings, choose lycoris-loha, choose dora, and this happens:

Traceback (most recent call last): File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', '/workspace/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', '/workspace/stuff/output/config_lora-20240613-122335.toml']' died with <Signals.SIGKILL: 9>.

those are the "default" settings: https://pastebin.com/41Gq3rVA

I am sorry for the formatting on the first error and on the last setting, it was the only way I could recover them as I just closed runpod and had them like that on my ctrl + v. Was using Runpod pytorch 2.0.1 template.

mi8m commented 3 weeks ago

Dora option alone also seems to be dropping RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

nephi-dev commented 3 weeks ago

Same error here, I tried updating lycoris to the latest version, but the error persists

nephi-dev commented 2 weeks ago

updating lycoris to the latest DEV version fixed it

mi8m commented 2 weeks ago

could you explain the process for updating it?

nephi-dev commented 2 weeks ago

could you explain the process for updating it?

this should work ./venv/Scripts/activate && pip install lycoris_lora -U --pre

mi8m commented 2 weeks ago

thx, gonna try it later on