IrisRainbowNeko / HCP-Diffusion

A universal Stable-Diffusion toolbox
Apache License 2.0
899 stars 75 forks source link

RuntimeError: unscale_() has already been called on this optimizer since the last update(). #47

Closed spillerrec closed 10 months ago

spillerrec commented 10 months ago

I'm trying to train a lora with multiple embeddings, but I keep getting this error. I have tried to change a bunch of stuff in the YAML configuration to see if I could get any futher, but I haven't been able to get past this error. Any ideas on what is going wrong? Contents of hcp-test folder: settings.zip

PS R:\lora-test\hcp-test> accelerate launch -m hcpdiff.train_ac_single --cfg .\lora_anime_character.yaml bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) 2023-11-13 01:24:24.563 | INFO | hcpdiff.loggers.cli_logger:_info:30 - world_size: 1 2023-11-13 01:24:24.563 | INFO | hcpdiff.loggers.cli_logger:_info:30 - accumulation: 1 You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors. 2023-11-13 01:24:28.321 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: black_san_magnolia, len: 4, id: 49408 2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: bloody_reina, len: 2, id: 49409 2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: blue_san_magnolia, len: 4, id: 49410 2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: formal_giad, len: 4, id: 49411 2023-11-13 01:24:28.322 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: neck_scar, len: 2, id: 49412 2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: personal_room, len: 4, id: 49413 2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: shinei_nouzen, len: 3, id: 49414 2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: undertaker, len: 4, id: 49415 2023-11-13 01:24:28.323 | INFO | hcpdiff.models.text_emb_ex:hook:86 - hook: vladilena_millize, len: 3, id: 49416 A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2023-11-13 01:24:29.182 | INFO | hcpdiff.data.caption_loader:load:18 - 144 record(s) loaded with TXTCaptionLoader, from path 'L:/waifu_diffusion/anime-tagger/out/86/512x512' 2023-11-13 01:24:29.183 | INFO | hcpdiff.data.bucket:build_buckets_from_images:241 - build buckets from images size F:\python\lib\site-packages\sklearn\cluster_kmeans.py:1412: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) 2023-11-13 01:24:29.632 | INFO | hcpdiff.data.bucket:build_buckets_from_images:262 - buckets info: size:[512 512], num:144 2023-11-13 01:24:29.666 | INFO | hcpdiff.loggers.cli_logger:_info:30 - len(train_dataset): 144 F:\python\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:128: FutureWarning: The configuration file of this scheduler: PNDMScheduler { "_class_name": "PNDMScheduler", "_diffusers_version": "0.19.3", "beta_end": 0.012, "beta_schedule": "scaled_linear", "beta_start": 0.00085, "num_train_timesteps": 1000, "prediction_type": "epsilon", "set_alpha_to_one": false, "skip_prk_steps": false, "steps_offset": 0, "timestep_spacing": "leading", "trained_betas": null } is outdated. steps_offset should be set to 1 instead of 0. Please make sure to update the config accordingly as leaving steps_offset might led to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file deprecate("steps_offset!=1", "1.0.0", deprecation_message, standard_warn=False) 2023-11-13 01:24:30.572 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Running training 2023-11-13 01:24:30.572 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num batches each epoch = 144 2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Num Steps = 1000 2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Instantaneous batch size per device = 1 2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Total train batch size (w. parallel, distributed & accumulation) = 1 2023-11-13 01:24:30.573 | INFO | hcpdiff.loggers.cli_logger:_info:30 - Gradient Accumulation steps = 1 bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) bin F:\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll F:\python\lib\runpy.py:126: RuntimeWarning: 'hcpdiff.train_ac_single' found in sys.modules after import of package 'hcpdiff', but prior to execution of 'hcpdiff.train_ac_single'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) F:\python\lib\site-packages\hcpdiff\train_ac.py:425: FutureWarning: Accessing config attribute scaling_factor directly via 'AutoencoderKL' object attribute is deprecated. Please access 'scaling_factor' over 'AutoencoderKL's config object instead, e.g. 'unet.config.scaling_factor'. latents = latents*self.vae.scaling_factor F:\python\lib\site-packages\torch\utils\checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") Traceback (most recent call last): File "F:\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "F:\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\python\lib\site-packages\hcpdiff\train_ac_single.py", line 61, in trainer.train() File "F:\python\lib\site-packages\hcpdiff\train_ac.py", line 391, in train loss = self.train_one_step(data_list) File "F:\python\lib\site-packages\hcpdiff\train_ac.py", line 481, in train_one_step self.accelerator.clip_gradnorm(clip_param, self.cfgs.train.max_grad_norm) File "F:\python\lib\site-packages\accelerate\accelerator.py", line 1916, in clip_gradnorm self.unscale_gradients() File "F:\python\lib\site-packages\accelerate\accelerator.py", line 1879, in unscalegradients self.scaler.unscale(opt) File "F:\python\lib\site-packages\torch\cuda\amp\gradscaler.py", line 275, in unscale raise RuntimeError("unscale() has already been called on this optimizer since the last update().") RuntimeError: unscale() has already been called on this optimizer since the last update(). Traceback (most recent call last): File "F:\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "F:\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\python\Scripts\accelerate.exe__main__.py", line 7, in File "F:\python\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "F:\python\lib\site-packages\accelerate\commands\launch.py", line 959, in launch_command simple_launcher(args) File "F:\python\lib\site-packages\accelerate\commands\launch.py", line 624, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\python\python.exe', '-m', 'hcpdiff.train_ac_single', '--cfg', '.\lora_anime_character.yaml']' returned non-zero exit status 1.

spillerrec commented 10 months ago

After upgrading the accelerate package (which didn't work), upgrading the hcpdiff package (which gave a new error), and running hcpinit again, it is now doing something.