camenduru / kohya_ss-colab

The Unlicense
74 stars 12 forks source link

I don't know, but it's there since last three months #7

Open Skystapper opened 10 months ago

Skystapper commented 10 months ago

running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 170 num reg images / 正則化画像の数: 104 num batches per epoch / 1epochのバッチ数: 340 num epochs / epoch数: 10 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 3400 steps: 0% 0/3400 [00:00<?, ?it/s] epoch 1/10 /usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() and inp.query.storage().data_ptr() == inp.key.storage().data_ptr() steps: 0% 1/3400 [00:02<2:15:17, 2.39s/it, loss=0.179]Token indices sequence length is longer than the specified maximum sequence length for this model (88 > 77). Running this sequence through the model will result in indexing errors Traceback (most recent call last): File "/content/kohya_ss/./train_network.py", line 974, in trainer.train(args) File "/content/kohya_ss/./train_network.py", line 787, in train optimizer.step() File "/usr/local/lib/python3.10/dist-packages/accelerate/optimizer.py", line 133, in step self.scaler.step(self.optimizer, closure) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 374, in step retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step retval = optimizer.step(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/dadaptation/experimental/dadapt_adam_preprint.py", line 142, in step raise RuntimeError(f"Setting different lr values in different parameter groups is only supported for values of 0") RuntimeError: Setting different lr values in different parameter groups is only supported for values of 0 steps: 0% 1/3400 [00:03<2:57:24, 3.13s/it, loss=0.179] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 979, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', './train_network.py', '--v_parameterization', '--enable_bucket', '--weighted_captions', '--pretrained_model_name_or_path=/content/drive/MyDrive/training/MechaMusumex.safetensors', '--train_data_dir=/content/drive/MyDrive/training/new_isabel/img', '--reg_data_dir=/content/drive/MyDrive/training/new_isabel/reg', '--resolution=1024,1024', '--output_dir=/content/drive/MyDrive/training/new_isabel/model/', '--logging_dir=/content/drive/MyDrive/training/new_isabel/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_args', 'module_dropout=0.69', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=isabell(nikke)', '--lr_scheduler_num_cycles=10', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=340', '--train_batch_size=1', '--max_train_steps=3400', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=DAdaptation', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--keep_tokens=2', '--bucket_reso_steps=64', '--xformers', '--scale_v_pred_loss_like_noise_pred', '--noise_offset=0.07']' returned non-zero exit status 1.

I don't know what is problem with this, it used to work perfectly before but now it just doesn't work.