bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.18k stars 620 forks source link

Lora error #151

Closed Bellatrix8 closed 9 months ago

Bellatrix8 commented 1 year ago

Hello, so the steps I followed were to configure using the LowVRAM JSON file, then select the inputs for image, log, and model, and after I tried training I received this 'heavy' line of code and nothing would happen after that. What would be the issue?

CUDA SETUP: Loading binary C:\kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit Adam optimizer running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 3000 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 3000 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 3000 Traceback (most recent call last): File "C:\kohya\kohya_ss\train_network.py", line 573, in train(args) File "C:\kohya\kohya_ss\train_network.py", line 356, in train "ss_noise_offset": args.noise_offset, AttributeError: 'Namespace' object has no attribute 'noise_offset' Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\kohya\kohyass\venv\Scripts\accelerate.exe_main.py", line 7, in File "C:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\kohya\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=E:/stable diffusion 2.1/stable-diffusion-webui-master/models/Stable-diffusion/Anything-V3.0-pruned-fp16.ckpt', '--train_data_dir=E:/train/trained/lora/image', '--resolution=512,512', '--output_dir=E:/train/trained/lora/model', '--logging_dir=E:/train/trained/lora/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=lora', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=3000', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

AtoHolewa commented 1 year ago

I have the same problem :(

AtoHolewa commented 1 year ago

CUDA SETUP: Loading binary D:\AI\SUPERSD\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit Adam optimizer running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 1500 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 750 num epochs / epoch数: 1 batch size per device / バッチサイズ: 2 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 450 Traceback (most recent call last): File "D:\AI\SUPERSD\Kohya\kohya_ss\train_network.py", line 573, in train(args) File "D:\AI\SUPERSD\Kohya\kohya_ss\train_network.py", line 356, in train "ss_noise_offset": args.noise_offset, AttributeError: 'Namespace' object has no attribute 'noise_offset' Traceback (most recent call last): File "C:\Users\Bonnier\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Bonnier\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI\SUPERSD\Kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\AI\SUPERSD\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\AI\SUPERSD\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\AI\SUPERSD\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\AI\SUPERSD\Kohya\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=D:/AI/LORA/AtoLora/image', '--resolution=512,512', '--output_dir=D:/AI/LORA/AtoLora/model', '--logging_dir=D:/AI/LORA/AtoLora/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Ato Holewa', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=450', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

Bellatrix8 commented 1 year ago

Well,i hope the creator will troubleshoot this in the next few days,still if you come up with a solution please tell me as well

martianunlimited commented 1 year ago

That has nothing to do with bitsandbytes / 8bitADAM The repository bmaltais/kohya_ss is out-of-sync with the repository kohya_ss/sd_scripts in particular file library/train_util.py is out of sync. Replace that file with the version in kohya_ss/sd_scripts
https://github.com/bmaltais/kohya_ss/issues/192

File "D:\AI\SUPERSD\Kohya\kohya_ss\train_network.py", line 356, in train "ss_noise_offset": args.noise_offset, AttributeError: 'Namespace' object has no attribute 'noise_offset' Traceback (most recent call last):

DieserBobby commented 1 year ago

I have a similar problem (I believe, at least the error code-block looks similar) . I tried several ideas as a solution, but nothing worked:

Any new or better ideas for me to get it running?

Here the end of my error code

Traceback (most recent call last): File "C:\Users\bobby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\bobby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\Stable-Diffusion\kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "F:\Stable-Diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "F:\Stable-Diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "F:\Stable-Diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\Stable-Diffusion\kohya\kohya_ss\venv\Scripts\python.exe', 'train_db.py', '--pretrained_model_name_or_path=F:/Stable-Diffusion/stable-diffusion-webui/models/Stable-diffusion/liberty_main.safetensors', '--train_data_dir=F:/Stable-Diffusion/kohya/Bilder_Training/lora/image', '--resolution=512,512', '--output_dir=F:/Stable-Diffusion/stable-diffusion-webui/models/Lora', '--logging_dir=F:/Stable-Diffusion/kohya/Bilder_Training/lora/log', '--save_model_as=safetensors', '--output_name=dazpose', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=2300', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

vietdragon commented 1 year ago

Same problem!

caching latents. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 54/54 [00:12<00:00, 4.26it/s] import network module: networks.lora create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "C:\Users\Long Dao\kohya_ss\train_network.py", line 507, in train(args) File "C:\Users\Long Dao\kohya_ss\train_network.py", line 150, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "C:\Users\Long Dao\kohya_ss\library\train_util.py", line 1536, in get_optimizer assert optimizer_type is None or optimizer_type == "", "both option use_8bit_adam and optimizer_type are specified / use_8bit_adamとoptimizer_typeの両方 のオプションが指定されています" AssertionError: both option use_8bit_adam and optimizer_type are specified / use_8bit_adamとoptimizer_typeの両方のオプションが指定されています Traceback (most recent call last): File "C:\Users\Long Dao\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Long Dao\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Long Dao\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Long Dao\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\Users\Long Dao\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\Users\Long Dao\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Long Dao\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=F:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV13_v13.ckpt', '--train_data_dir=F:\Lora Training Data\Stmtp\image', '--resolution=512,512', '--output_dir=F:\Lora Training Data\Stmtp\model', '--logging_dir=F:\Lora Training Data\Stmtp\log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=540', '--train_batch_size=1', '--max_train_steps=5400', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

DieserBobby commented 1 year ago

I got it running again (checking very very many combinations): In my case: "Memory efficient attention" should be on (some days before there hadn't been the need to) AND "use 8bit adam" in the advanced section shouldn't be checked.

ZhouCongArts commented 1 year ago

I also had the same problem, unchecking "Use 8bit adam" in Training parameters > Advanced Configuration worked for me.

lennyfung commented 1 year ago

AdamW8bit

How do you "Uncheck" this item? It's not a checkbox.

DieserBobby commented 1 year ago

You need to click on "Advanced Configuration" further down, on the same page where you made the screenshot. Plenty new options will appear... among these: "use 8bit adam" which is checked on default. Uncheck and you made a step in the right direction... hopefully :-)

Zirnworks commented 1 year ago

image I don't see the option to uncheck 8bit adam anywhere in my advanced config.

daflood commented 1 year ago

image I don't see the option to uncheck 8bit adam anywhere in my advanced config.

Try selecting AdamW in the optimizer drop down instead of AdamW8bit. That fixed it for me.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.