bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.18k stars 620 forks source link

Songze:The final step of LORA training error,urgently ask for help and guidance!!! #189

Closed lisongze8 closed 9 months ago

lisongze8 commented 1 year ago

CUDA and memory problems originally occurred in my LORA training execution program. After two days of research, train_util.py was replaced and Adamoptimizer was changed to Lion, which seems to have fixed some errors. However, the following errors still appear in the end. May I ask the expert, what is the cause of this problem? How to solve it? Thank you very much

Load CSS... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Folder 100_test: 2000 steps max_train_steps = 2000 stop_text_encoder_training = 0 lr_warmup_steps = 0 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="C:/Users/Lenovo/automatic1111-stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt" --train_data_dir="C:/Users/Lenovo/Documents/Lora Training Data/Test/image" --resolution=512,512 --output_dir="C:/Users/Lenovo/Documents/Lora Training Data/Test/model" --logging_dir="C:/Users/Lenovo/Documents/Lora Training Data/Test/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="Daisy" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="2000" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="Lion" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn --gradient_checkpointing --xformers --bucket_no_upscale Traceback (most recent call last): File "C:\Users\Lenovo\Documents\kohya\kohya_ss\train_network.py", line 18, in import library.train_util as train_util File "C:\Users\Lenovo\Documents\kohya\kohya_ss\library\train_util.py", line 73

sd-scripts/train_util.py at main · kohya-ss/sd-scripts
                                        ^

SyntaxError: invalid character '·' (U+00B7) Traceback (most recent call last): File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Lenovo\Documents\kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Lenovo\Documents\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\Users\Lenovo\Documents\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\Users\Lenovo\Documents\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Lenovo\Documents\kohya\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=C:/Users/Lenovo/automatic1111-stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt', '--train_data_dir=C:/Users/Lenovo/Documents/Lora Training Data/Test/image', '--resolution=512,512', '--output_dir=C:/Users/Lenovo/Documents/Lora Training Data/Test/model', '--logging_dir=C:/Users/Lenovo/Documents/Lora Training Data/Test/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Daisy', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=2000', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=Lion', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.