Closed Bellatrix8 closed 9 months ago
I have the same problem :(
CUDA SETUP: Loading binary D:\AI\SUPERSD\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit Adam optimizer
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 1500
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 750
num epochs / epoch数: 1
batch size per device / バッチサイズ: 2
total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 2
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 450
Traceback (most recent call last):
File "D:\AI\SUPERSD\Kohya\kohya_ss\train_network.py", line 573, in
Well,i hope the creator will troubleshoot this in the next few days,still if you come up with a solution please tell me as well
That has nothing to do with bitsandbytes / 8bitADAM
The repository bmaltais/kohya_ss is out-of-sync with the repository kohya_ss/sd_scripts
in particular file library/train_util.py is out of sync. Replace that file with the version in kohya_ss/sd_scripts
https://github.com/bmaltais/kohya_ss/issues/192
File "D:\AI\SUPERSD\Kohya\kohya_ss\train_network.py", line 356, in train "ss_noise_offset": args.noise_offset, AttributeError: 'Namespace' object has no attribute 'noise_offset' Traceback (most recent call last):
I have a similar problem (I believe, at least the error code-block looks similar) . I tried several ideas as a solution, but nothing worked:
Any new or better ideas for me to get it running?
Here the end of my error code
Traceback (most recent call last):
File "C:\Users\bobby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\bobby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\Stable-Diffusion\kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in
Same problem!
caching latents.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 54/54 [00:12<00:00, 4.26it/s]
import network module: networks.lora
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
Traceback (most recent call last):
File "C:\Users\Long Dao\kohya_ss\train_network.py", line 507, in
I got it running again (checking very very many combinations): In my case: "Memory efficient attention" should be on (some days before there hadn't been the need to) AND "use 8bit adam" in the advanced section shouldn't be checked.
I also had the same problem, unchecking "Use 8bit adam" in Training parameters > Advanced Configuration worked for me.
How do you "Uncheck" this item? It's not a checkbox.
You need to click on "Advanced Configuration" further down, on the same page where you made the screenshot. Plenty new options will appear... among these: "use 8bit adam" which is checked on default. Uncheck and you made a step in the right direction... hopefully :-)
I don't see the option to uncheck 8bit adam anywhere in my advanced config.
I don't see the option to uncheck 8bit adam anywhere in my advanced config.
Try selecting AdamW in the optimizer drop down instead of AdamW8bit. That fixed it for me.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Hello, so the steps I followed were to configure using the LowVRAM JSON file, then select the inputs for image, log, and model, and after I tried training I received this 'heavy' line of code and nothing would happen after that. What would be the issue?
CUDA SETUP: Loading binary C:\kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit Adam optimizer running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 3000 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 3000 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 3000 Traceback (most recent call last): File "C:\kohya\kohya_ss\train_network.py", line 573, in
train(args)
File "C:\kohya\kohya_ss\train_network.py", line 356, in train
"ss_noise_offset": args.noise_offset,
AttributeError: 'Namespace' object has no attribute 'noise_offset'
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\kohya\kohyass\venv\Scripts\accelerate.exe_main.py", line 7, in
File "C:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\kohya\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=E:/stable diffusion 2.1/stable-diffusion-webui-master/models/Stable-diffusion/Anything-V3.0-pruned-fp16.ckpt', '--train_data_dir=E:/train/trained/lora/image', '--resolution=512,512', '--output_dir=E:/train/trained/lora/model', '--logging_dir=E:/train/trained/lora/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=lora', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=3000', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.