I'm trying to train a lora with multiple embeddings, but I keep getting this error. I have tried to change a bunch of stuff in the YAML configuration to see if I could get any futher, but I haven't been able to get past this error. Any ideas on what is going wrong?
Contents of hcp-test folder: settings.zip
PS R:\lora-test\hcp-test> accelerate launch -m hcpdiff.train_ac_single --cfg .\lora_anime_character.yaml
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1--num_machines was set to a value of 1--mixed_precision was set to a value of 'no'--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
2023-11-13 01:24:24.563 | INFO | hcpdiff.loggers.cli_logger:_info:30 - world_size: 1
2023-11-13 01:24:24.563 | INFO | hcpdiff.loggers.cli_logger:_info:30 - accumulation: 1
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2023-11-13 01:24:28.321 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: black_san_magnolia, len: 4, id: 49408
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: bloody_reina, len: 2, id: 49409
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: blue_san_magnolia, len: 4, id: 49410
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: formal_giad, len: 4, id: 49411
2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: neck_scar, len: 2, id: 49412
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: personal_room, len: 4, id: 49413
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: shinei_nouzen, len: 3, id: 49414
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: undertaker, len: 4, id: 49415
2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: vladilena_millize, len: 3, id: 49416
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
2023-11-13 01:24:29.182 | INFO | hcpdiff.data.caption_loader:load:18 - 144 record(s) loaded with TXTCaptionLoader, from path 'L:/waifu_diffusion/anime-tagger/out/86/512x512'
2023-11-13 01:24:29.183 | INFO | hcpdiff.data.bucket:build_buckets_from_images:241 - build buckets from images size
F:\python\lib\site-packages\sklearn\cluster_kmeans.py:1412: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
2023-11-13 01:24:29.632 | INFO | hcpdiff.data.bucket:build_buckets_from_images:262 - buckets info: size:[512 512], num:144
2023-11-13 01:24:29.666 | INFO | hcpdiff.loggers.cli_logger:_info:30 - len(train_dataset): 144
F:\python\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:128: FutureWarning: The configuration file of this scheduler: PNDMScheduler {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.19.3",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"skip_prk_steps": false,
"steps_offset": 0,
"timestep_spacing": "leading",
"trained_betas": null
}
is outdated. steps_offset should be set to 1 instead of 0. Please make sure to update the config accordingly as leaving steps_offset might led to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
deprecate("steps_offset!=1", "1.0.0", deprecation_message, standard_warn=False)
2023-11-13 01:24:30.572 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Running training
2023-11-13 01:24:30.572 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num batches each epoch = 144
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num Steps = 1000
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Instantaneous batch size per device = 1
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Total train batch size (w. parallel, distributed & accumulation) = 1
2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Gradient Accumulation steps = 1
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
F:\python\lib\site-packages\hcpdiff\train_ac.py:425: FutureWarning: Accessing config attribute scaling_factor directly via 'AutoencoderKL' object attribute is deprecated. Please access 'scaling_factor' over 'AutoencoderKL's config object instead, e.g. 'unet.config.scaling_factor'.
latents = latents*self.vae.scaling_factor
F:\python\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "F:\python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "F:\python\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\python\lib\site-packages\hcpdiff\train_ac_single.py", line 61, in
trainer.train()
File "F:\python\lib\site-packages\hcpdiff\train_ac.py", line 391, in train
loss = self.train_one_step(data_list)
File "F:\python\lib\site-packages\hcpdiff\train_ac.py", line 481, in train_one_step
self.accelerator.clip_gradnorm(clip_param, self.cfgs.train.max_grad_norm)
File "F:\python\lib\site-packages\accelerate\accelerator.py", line 1916, in clip_gradnorm
self.unscale_gradients()
File "F:\python\lib\site-packages\accelerate\accelerator.py", line 1879, in unscalegradients
self.scaler.unscale(opt)
File "F:\python\lib\site-packages\torch\cuda\amp\gradscaler.py", line 275, in unscale
raise RuntimeError("unscale() has already been called on this optimizer since the last update().")
RuntimeError: unscale() has already been called on this optimizer since the last update().
Traceback (most recent call last):
File "F:\python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "F:\python\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\python\Scripts\accelerate.exe__main__.py", line 7, in
File "F:\python\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "F:\python\lib\site-packages\accelerate\commands\launch.py", line 959, in launch_command
simple_launcher(args)
File "F:\python\lib\site-packages\accelerate\commands\launch.py", line 624, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\python\python.exe', '-m', 'hcpdiff.train_ac_single', '--cfg', '.\lora_anime_character.yaml']' returned non-zero exit status 1.
After upgrading the accelerate package (which didn't work), upgrading the hcpdiff package (which gave a new error), and running hcpinit again, it is now doing something.
I'm trying to train a lora with multiple embeddings, but I keep getting this error. I have tried to change a bunch of stuff in the YAML configuration to see if I could get any futher, but I haven't been able to get past this error. Any ideas on what is going wrong? Contents of hcp-test folder: settings.zip