Lycoris extreme slow training after update to latest version

killerciao commented 5 months ago

After updating to the latest release Lycoris/LoCon (the only one i tested) training in kohya_ss is super slow. With my 4090 with the same setting loaded, in the previous version was running at about 1it/s for SDXL training, now 4-10s/it. Nothing has changed in the training data. Uninstalling 2.2post3 and installing 2.1.0post2 fixes the problem

killerciao commented 5 months ago

V2.2post3

Screenshot 2024-03-29 113124 V2.1.0post2

Screenshot 2024-03-29 113057

KohakuBlueleaf commented 5 months ago

please provide full configuration.

killerciao commented 5 months ago

These are the setting used in both Versions

{ "LoRA_type": "LyCORIS/LoCon", "LyCORIS_preset": "full", "adaptive_noise_scale": 0, "additional_parameters": "--max_grad_norm=0", "block_alphas": "", "block_dims": "", "block_lr_zero_threshold": "", "bucket_no_upscale": true, "bucket_reso_steps": 64, "bypass_mode": false, "cache_latents": true, "cache_latents_to_disk": true, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "caption_extension": ".txt", "clip_skip": "1", "color_aug": false, "constrain": 0.0, "conv_alpha": 16, "conv_block_alphas": "", "conv_block_dims": "", "conv_dim": 32, "dataset_config": "", "debiased_estimation_loss": false, "decompose_both": false, "dim_from_weights": false, "dora_wd": false, "down_lr_weight": "", "enable_bucket": true, "epoch": 6, "factor": -1, "flip_aug": false, "fp8_base": false, "full_bf16": false, "full_fp16": false, "gpu_ids": "", "gradient_accumulation_steps": 1, "gradient_checkpointing": false, "keep_tokens": "0", "learning_rate": 1.0, "log_tracker_config": "", "log_tracker_name": "", "logging_dir": "H:/t00nstyle/log", "lora_network_weights": "", "lr_scheduler": "cosine", "lr_scheduler_args": "", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "lr_warmup": 0, "max_bucket_reso": 2048, "max_data_loader_n_workers": "0", "max_grad_norm": 1, "max_resolution": "1024,1024", "max_timestep": 1000, "max_token_length": "225", "max_train_epochs": "", "max_train_steps": "", "mem_eff_attn": false, "mid_lr_weight": "", "min_bucket_reso": 256, "min_snr_gamma": 0, "min_timestep": 0, "mixed_precision": "bf16", "model_list": "custom", "module_dropout": 0, "multi_gpu": false, "multires_noise_discount": 0.3, "multires_noise_iterations": 6, "network_alpha": 16, "network_dim": 32, "network_dropout": 0, "noise_offset": 0.0357, "noise_offset_type": "Multires", "num_cpu_threads_per_process": 2, "num_machines": 1, "num_processes": 1, "optimizer": "Prodigy", "optimizer_args": "decouple=True weight_decay=0.5 betas=0.9,0.99 use_bias_correction=False", "output_dir": "H:/t00nstyle/model", "output_name": "t00nstylev1PonySDXL", "persistent_data_loader_workers": false, "pretrained_model_name_or_path": "F:/IA FILES/Models/Stablediffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors", "prior_loss_weight": 1.0, "random_crop": false, "rank_dropout": 0, "rank_dropout_scale": false, "reg_data_dir": "", "rescaled": false, "resume": "", "sample_every_n_epochs": 0, "sample_every_n_steps": 250, "sample_prompts": "score_9, score_8_up, score_7_up,flashing tits, nipples, looking at viewer, tongue out, wink, in pool, bikini, t00nstyle --n low quality, worst quality, bad anatomy,bad composition, poor, low effort --w 1024 --h 1024 --d 1 --l 7 --s 28", "sample_sampler": "euler_a", "save_every_n_epochs": 1, "save_every_n_steps": 0, "save_last_n_steps": 0, "save_last_n_steps_state": 0, "save_model_as": "safetensors", "save_precision": "bf16", "save_state": false, "scale_v_pred_loss_like_noise_pred": false, "scale_weight_norms": 1, "sdxl": true, "sdxl_cache_text_encoder_outputs": false, "sdxl_no_half_vae": true, "seed": "12345", "shuffle_caption": true, "stop_text_encoder_training_pct": 0, "text_encoder_lr": 1.0, "train_batch_size": 1, "train_data_dir": "H:/t00nstyle/img", "train_norm": false, "train_on_input": false, "training_comment": "", "unet_lr": 1.0, "unit": 1, "up_lr_weight": "", "use_cp": true, "use_scalar": false, "use_tucker": false, "use_wandb": false, "v2": false, "v_parameterization": false, "v_pred_like_loss": 0, "vae": "", "vae_batch_size": 0, "wandb_api_key": "", "wandb_run_name": "", "weighted_captions": false, "xformers": "xformers" }

KohakuBlueleaf commented 5 months ago

What's your hardware

killerciao commented 5 months ago

What's your hardware

Win 11, Nvidia 4090, 64gb ddr5 ram, i9 13900k, running from nvme 4th gen

KohakuBlueleaf commented 5 months ago

I do some test and cannot reproduce this error... But I'm using plain kohya-ss/sd-scripts Will try to use the GUI to test.

I'm using almost same hardware (64GB -> 128GB ram) as yours so it should be ok

whythisusername commented 5 months ago

@KohakuBlueleaf Since someone already reported this issue, can confirm something changed specifically after 2.2.0.dev7 update (tested prev and after versions) that drops the speed from ~1.5 it/s to ~1.5s/it with 4090 training using this config, its for easy-scripts

KohakuBlueleaf commented 5 months ago

@whythisusername Can you try drouput=0 (all kind of dropout)? Maybe something with dropout goes wrong

whythisusername commented 5 months ago

@KohakuBlueleaf Yeah, here is some speed measurements between two versions, dropout significantly affects the speed with the latest, still a little slower than old version without it though

2.2.0.dev7 with dropout: 1321/2500 [17:22<15:30, 1.27it/s, avr_loss=0.107]
2.2.0.dev7 no dropout: 587/2500 [07:32<24:34, 1.30it/s, avr_loss=0.105]
2.3.0.dev6 with dropout: 533/2500 [10:59<40:33, 1.24s/it, avr_loss=0.115]
2.3.0.dev6 no dropout: 721/2500 [10:31<25:57, 1.14it/s, avr_loss=0.0954]

KohakuBlueleaf commented 4 months ago

@whythisusername does 2.3.0.dev10 still so slow?

whythisusername commented 4 months ago

@KohakuBlueleaf yes, haven't significantly changed

2.3.0.dev10 with dropout: 1021/2500 [20:14<29:18, 1.19s/it, avr_loss=0.102]
2.3.0.dev10 no dropout: 1174/2500 [17:35<19:52, 1.11it/s, avr_loss=0.0961]

KohakuBlueleaf commented 3 months ago

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

killerciao commented 3 months ago

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

4s/it 3.0.0dev4
1.25s/it 2.1.0post2

🤷‍♂️

KohakuBlueleaf commented 3 months ago

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

4s/it 3.0.0dev4

1.25s/it 2.1.0post2

🤷‍♂️

umm ok

I think it is due to some dtype things

KohakuBlueleaf commented 3 months ago

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

4s/it 3.0.0dev4

1.25s/it 2.1.0post2

🤷‍♂️

can you try other algorithm? LoKr in my env have same speed across different version

killerciao commented 3 months ago

LoKr: 1.3s/it 3.0.0dev4 Crash/ 2.1.0post2 Traceback (most recent call last): File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "H:\kohya_ss\sd-scripts\train_network.py", line 864, in train noise_pred = self.call_unet( File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "H:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module x = layer(x, context) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 673, in forward output = self.forward_body(hidden_states, context, timestep) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 655, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 599, in forward hidden_states = module(hidden_states) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 342, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/3588 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module> File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['H:\\kohya_ss\\venv\\Scripts\\python.exe', 'H:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'H:/alariko/model/config_lora-20240525-123304.toml', '--max_grad_norm=0']' returned non-zero exit status 1.

KohakuBlueleaf commented 3 months ago

LoKr: 1.3s/it 3.0.0dev4 Crash/ 2.1.0post2 Traceback (most recent call last): File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "H:\kohya_ss\sd-scripts\train_network.py", line 864, in train noise_pred = self.call_unet( File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "H:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module x = layer(x, context) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 673, in forward output = self.forward_body(hidden_states, context, timestep) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 655, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 599, in forward hidden_states = module(hidden_states) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 342, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/3588 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module> File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['H:\\kohya_ss\\venv\\Scripts\\python.exe', 'H:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'H:/alariko/model/config_lora-20240525-123304.toml', '--max_grad_norm=0']' returned non-zero exit status 1.

Thx, looks like locon have some bugs lokr and loha may be fine

whythisusername commented 3 months ago

@KohakuBlueleaf It's way more better now with my config, but still a little slower than it was with 2.2.0.dev7

3.0.0.dev4 with dropout: 1224/2500 [17:51<18:36, 1.14it/s, avr_loss=0.0945]
3.0.0.dev4 no dropout: 1246/2500 [17:51<17:58, 1.16it/s, avr_loss=0.0945]
2.2.0.dev7 with dropout: 1964/2500 [25:01<06:49, 1.31it/s, avr_loss=0.0993]

KohakuBlueleaf commented 3 months ago

@KohakuBlueleaf It's way more better now with my config, but still a little slower than it was with 2.2.0.dev7

3.0.0.dev4 with dropout: 1224/2500 [17:51<18:36, 1.14it/s, avr_loss=0.0945]

3.0.0.dev4 no dropout: 1246/2500 [17:51<17:58, 1.16it/s, avr_loss=0.0945]

2.2.0.dev7 with dropout: 1964/2500 [25:01<06:49, 1.31it/s, avr_loss=0.0993]

Can you try to enable "bypass_mode"? --network_args "bypass_mode=True"

whythisusername commented 3 months ago

@KohakuBlueleaf Almost on par with old performance now

1983/2500 [27:16<07:06, 1.21it/s, avr_loss=0.124]

KohakuBlueleaf commented 3 months ago

@KohakuBlueleaf Almost on par with old performance now

1983/2500 [27:16<07:06, 1.21it/s, avr_loss=0.124]

Ok I think the problem is solved The reconstruction mode for LoCon is not fast in bp. Just enable bypass_mode if you think the speed is slower than expectation (note: LoHa and LoKr with bypass mode will be slower than default)

KohakuBlueleaf / LyCORIS

Lycoris extreme slow training after update to latest version #167