bmaltais / kohya_ss

Apache License 2.0
9.75k stars 1.25k forks source link

got an unexpected keyword argument 'num_decay_steps' #2812

Open Nova-lotus opened 2 months ago

Nova-lotus commented 2 months ago

I'm sorry if this is an issue that has been opened before, but this morning when I was just going to train a new lora I got this, I'm using Prodigy and Cosine, not cosine with warmup? Also just yesterday it was working perfectly, I'm on the sd3-flux.1 branch, when I just updated to the new commit because the white color was giving me aneurysms, I started getting this, even when I go back to the previous commit I still get it.

Traceback (most recent call last):
  File "D:\kohya_ss\sd-scripts\sdxl_train_network.py", line 210, in <module>
    trainer.train(args)
  File "D:\kohya_ss\sd-scripts\train_network.py", line 532, in train
    lr_scheduler = train_util.get_scheduler_fix(args, optimizer, accelerator.num_processes)
  File "D:\kohya_ss\sd-scripts\library\train_util.py", line 4724, in get_scheduler_fix
    return schedule_func(
TypeError: get_cosine_schedule_with_warmup() got an unexpected keyword argument 'num_decay_steps'
Traceback (most recent call last):
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
  File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
  File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'D:/kohya_ss/outputs/config_lora-20240913-165400.toml', '--max_grad_norm=0', '--network_train_unet_only']' returned non-zero exit status 1.
Nova-lotus commented 2 months ago

Well I rolled back to 63c1e483 and it worked again, so yeah

bmaltais commented 2 months ago

Was this in the sd3-flud.1 branch? SDXL does not work in that branch at the moment.

Nova-lotus commented 2 months ago

Yeah it was, seems like it, the latest it was working for me was that commit

zadokov commented 2 months ago

It is still not working on current commit: https://github.com/bmaltais/kohya_ss/commit/d24fae17b7a30b62fc4f200d1ff999a9551c20a2 Prodigy worked well for me and now it is not working. Any way to have it working?

bmaltais commented 2 months ago

Can you share the .json config for the training? I can't reproduce the issue. It might be related to a particular parameter you use

zadokov commented 2 months ago

Can you share the .json config for the training? I can't reproduce the issue. It might be related to a particular parameter you use

Attached FT-FLX-14-Prodigy.json

Enyakk commented 2 months ago

I have the same issue and it stemmed from kohya's sd-scripts repository. I can form this happens when trying to trani a FLUX dev1 LoRA. I can provide a toml-config if useful.

The issue is when I set the scheduler to cosine or linear. The issue goes away if I set constant instead.

I believe it could be this change causing the problem: https://github.com/kohya-ss/sd-scripts/pull/1393

zadokov commented 2 months ago

@bmaltais the latest commit fixed the problem! Prodigy with Cosine is working!

bmaltais commented 2 months ago

@bmaltais the latest commit fixed the problem! Prodigy with Cosine is working!

I pushed the fix. Should be good to go.