bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

Full Precision Training #1590

Closed nathanielgerdes closed 8 months ago

nathanielgerdes commented 1 year ago

I am trying to do Dreambooth training at full precision but there seems to be a forced --full_bf16 flag and I have no idea how to turn it off and am getting this error: [Dataset 0] loading image sizes. 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 4419.94it/s] prepare dataset prepare accelerator Traceback (most recent call last): File "/workspace/kohya_ss/./sdxl_train.py", line 753, in train(args) File "/workspace/kohya_ss/./sdxl_train.py", line 216, in train ) = sdxl_train_util.load_target_model(args, accelerator, "sdxl", weight_dtype) File "/workspace/kohya_ss/library/sdxl_train_util.py", line 21, in load_target_model model_dtype = match_mixed_precision(args, weight_dtype) # prepare fp16/bf16 File "/workspace/kohya_ss/library/sdxl_train_util.py", line 169, in match_mixed_precision weight_dtype == torch.bfloat16 AssertionError: full_bf16 requires mixed precision='bf16' / full_bf16を使う場合はmixed_precision='bf16'を指定してください。 Traceback (most recent call last): File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', './sdxl_train.py', '--pretrained_model_name_or_path=/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors', '--train_data_dir=/workspace/Done/img', '--resolution=1024,1024', '--output_dir=/workspace/Done/model', '--logging_dir=/workspace/Done/log', '--save_model_as=safetensors', '--full_bf16', '--output_name=Grace-Kohya--Runpod-SDXLBase', '--lr_scheduler_num_cycles=4', '--max_data_loader_n_workers=0', '--learning_rate=1e-05', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=3200', '--save_every_n_epochs=4', '--mixed_precision=no', '--save_precision=float', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Adafactor', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', 'weight_decay=0.01', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--bucket_no_upscale', '--noise_offset=0.0', '--max_grad_norm=0.0']' returned non-zero exit status 1."

josemerinom commented 1 year ago

I think it is a problem with the mixed precision configuration in Accelerate

run "!accelerate config"

and select the option you will use as mixed precision (no, fp16, bp16)