bmaltais / kohya_ss

Apache License 2.0
9.57k stars 1.23k forks source link

Another seemingly new data validation leads to unusable configs (24.0.3) #2330

Open bjspi opened 6 months ago

bjspi commented 6 months ago

Hi bmaltais,

thanks for your hard work! It seems there's yet another data validation added all of a sudden:


2024-04-18 11:26:03.408209: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-18 11:26:03.408328: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-18 11:26:03.409080: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-18 11:26:03.477587: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-18 11:26:04.746503: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-18 11:26:05 INFO     Loading settings from ./outputs/tmpfiledbooth.toml...                                                                                                                                  train_util.py:3744
                    INFO     ./outputs/tmpfiledbooth                                                                                                                                                                train_util.py:3763
Traceback (most recent call last):
  File "/workspace/kohya_ss/sd-scripts/sdxl_train.py", line 818, in <module>
    train(args)
  File "/workspace/kohya_ss/sd-scripts/sdxl_train.py", line 100, in train
    train_util.verify_training_args(args)
  File "/workspace/kohya_ss/sd-scripts/library/train_util.py", line 3473, in verify_training_args
    raise ValueError("adaptive_noise_scale requires noise_offset / adaptive_noise_scaleを使用するにはnoise_offsetが必要です")
ValueError: adaptive_noise_scale requires noise_offset / adaptive_noise_scaleを使用するにはnoise_offsetが必要です
Traceback (most recent call last):
  File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', '/workspace/kohya_ss/sd-scripts/sdxl_train.py', '--config_file', './outputs/tmpfiledbooth.toml', '--max_grad_norm=0.0', '--no_half_vae', '--train_text_encoder', '--learning_rate_te2=0']' returned non-zero exit status 1.
11:26:08-384362 INFO     Training has ended.                                                             

I could fix it by setting Noise offset=0.05 in the GUI. Beforehand it was not necessary.

Maybe those parameters are not "correct" - but why have a hard break instead of a warning only and instead just ignore it?

This renders a lot of existing configs useless I guess :) A lot of users will stumble across it I feel.

Thanks!

ashleykleynhans commented 6 months ago

Yeah, I am having issues importing my previously saved configs recently, it would be great if the new versions were backwards compatible with previously saved configs.

bmaltais commented 6 months ago

I am not able to reproduce this bug... I am not sure why you have it... If you can share the json config file you use with the old style I hope I can use it to reproduce the issue and understand how it is triggered... But it is hard to do without being able to do so.

ashleykleynhans commented 6 months ago

@bmaltais it happens with the same JSON I shared in #2308

bmaltais commented 6 months ago

Let me try it again... when I tested with it I did ne see that issue... maybe it is holy happening on linux... let me test it there. OK... I am eble to reproduce it on linux... now I try to fix it ;-)

bmaltais commented 6 months ago

I think I have the fix... Pushed to dev...

bjspi commented 6 months ago

Awesome mate, thanks!

ashleykleynhans commented 6 months ago

Its working now, thanks @bmaltais