running training / 学習開始
num train images repeats / 学習画像の数×繰り返し回数: 170
num reg images / 正則化画像の数: 104
num batches per epoch / 1epochのバッチ数: 340
num epochs / epoch数: 10
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 3400
steps: 0% 0/3400 [00:00<?, ?it/s]
epoch 1/10
/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
and inp.query.storage().data_ptr() == inp.key.storage().data_ptr()
steps: 0% 1/3400 [00:02<2:15:17, 2.39s/it, loss=0.179]Token indices sequence length is longer than the specified maximum sequence length for this model (88 > 77). Running this sequence through the model will result in indexing errors
Traceback (most recent call last):
File "/content/kohya_ss/./train_network.py", line 974, in
trainer.train(args)
File "/content/kohya_ss/./train_network.py", line 787, in train
optimizer.step()
File "/usr/local/lib/python3.10/dist-packages/accelerate/optimizer.py", line 133, in step
self.scaler.step(self.optimizer, closure)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 374, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step
retval = optimizer.step(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/dadaptation/experimental/dadapt_adam_preprint.py", line 142, in step
raise RuntimeError(f"Setting different lr values in different parameter groups is only supported for values of 0")
RuntimeError: Setting different lr values in different parameter groups is only supported for values of 0
steps: 0% 1/3400 [00:03<2:57:24, 3.13s/it, loss=0.179]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', './train_network.py', '--v_parameterization', '--enable_bucket', '--weighted_captions', '--pretrained_model_name_or_path=/content/drive/MyDrive/training/MechaMusumex.safetensors', '--train_data_dir=/content/drive/MyDrive/training/new_isabel/img', '--reg_data_dir=/content/drive/MyDrive/training/new_isabel/reg', '--resolution=1024,1024', '--output_dir=/content/drive/MyDrive/training/new_isabel/model/', '--logging_dir=/content/drive/MyDrive/training/new_isabel/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_args', 'module_dropout=0.69', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=isabell(nikke)', '--lr_scheduler_num_cycles=10', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=340', '--train_batch_size=1', '--max_train_steps=3400', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=DAdaptation', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--keep_tokens=2', '--bucket_reso_steps=64', '--xformers', '--scale_v_pred_loss_like_noise_pred', '--noise_offset=0.07']' returned non-zero exit status 1.
I don't know what is problem with this, it used to work perfectly before but now it just doesn't work.
running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 170 num reg images / 正則化画像の数: 104 num batches per epoch / 1epochのバッチ数: 340 num epochs / epoch数: 10 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 3400 steps: 0% 0/3400 [00:00<?, ?it/s] epoch 1/10 /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() and inp.query.storage().data_ptr() == inp.key.storage().data_ptr() steps: 0% 1/3400 [00:02<2:15:17, 2.39s/it, loss=0.179]Token indices sequence length is longer than the specified maximum sequence length for this model (88 > 77). Running this sequence through the model will result in indexing errors Traceback (most recent call last): File "/content/kohya_ss/./train_network.py", line 974, in
trainer.train(args)
File "/content/kohya_ss/./train_network.py", line 787, in train
optimizer.step()
File "/usr/local/lib/python3.10/dist-packages/accelerate/optimizer.py", line 133, in step
self.scaler.step(self.optimizer, closure)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 374, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step
retval = optimizer.step(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/dadaptation/experimental/dadapt_adam_preprint.py", line 142, in step
raise RuntimeError(f"Setting different lr values in different parameter groups is only supported for values of 0")
RuntimeError: Setting different lr values in different parameter groups is only supported for values of 0
steps: 0% 1/3400 [00:03<2:57:24, 3.13s/it, loss=0.179]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', './train_network.py', '--v_parameterization', '--enable_bucket', '--weighted_captions', '--pretrained_model_name_or_path=/content/drive/MyDrive/training/MechaMusumex.safetensors', '--train_data_dir=/content/drive/MyDrive/training/new_isabel/img', '--reg_data_dir=/content/drive/MyDrive/training/new_isabel/reg', '--resolution=1024,1024', '--output_dir=/content/drive/MyDrive/training/new_isabel/model/', '--logging_dir=/content/drive/MyDrive/training/new_isabel/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_args', 'module_dropout=0.69', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=isabell(nikke)', '--lr_scheduler_num_cycles=10', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=340', '--train_batch_size=1', '--max_train_steps=3400', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=DAdaptation', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--keep_tokens=2', '--bucket_reso_steps=64', '--xformers', '--scale_v_pred_loss_like_noise_pred', '--noise_offset=0.07']' returned non-zero exit status 1.
I don't know what is problem with this, it used to work perfectly before but now it just doesn't work.