i managed to used my local gdrive as a directory for the dataset to train, prob missed on something but can't figuire it out
Reading package lists...
Building dependency tree...
Reading state information...
aria2 is already the newest version (1.35.0-1build1).
liblz4-tool is already the newest version (1.9.2-2ubuntu0.20.04.1).
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
Preparing metadata (setup.py) ... done
Building wheel for library (setup.py) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.0/134.0 MB 8.5 MB/s eta 0:00:00
Download Progress Summary as of Sun Mar 5 14:50:08 2023
Status Legend:
(OK):download completed.
Skipping directory 10_vestia_zeta
2023-03-05 14:50:15.529319: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-03-05 14:50:16.593459: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-05 14:50:16.593633: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-03-05 14:50:16.593659: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
using existing wd14 tagger model
found 0 images.
loading model and labels
2023-03-05 14:50:23.731966: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2023-03-05 14:50:47.211466: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 37203968 exceeds 10% of free system memory.
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
0it [00:00, ?it/s]
done!
2023-03-05 14:50:58.485195: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-03-05 14:50:59.246223: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-05 14:50:59.246355: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-05 14:50:59.246374: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
load images from /content/drive/MyDrive/LoRA/zeta
found 0 images.
loading BLIP caption: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth
BLIP loaded
0it [00:00, ?it/s]
done!
--2023-03-05 14:51:31-- https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/v2-inference-v.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 416 Range Not Satisfiable
The file is already fully retrieved; nothing to do.
File successfully downloaded
+--------------------------+---------------------------------------------------------------+
| Hyperparameter | Value |
+--------------------------+---------------------------------------------------------------+
| mode | LoRA |
| use_dreambooth_method | True |
| lowram | True |
| v2 | True |
| v_parameterization | True |
| project_name | vestia_zeta |
| modelPath | /content/pretrained_model/anything-v3-fp32-pruned.safetensors |
| vaePath | /content/vae/anime.vae.pt |
| train_data_dir | /content/drive/MyDrive/LoRA/zeta |
| reg_data_dir | /content/drive/MyDrive/LoRA/reg_data |
| output_dir | /content/drive/MyDrive/training_dir/output |
| network_dim | 128 |
| network_alpha | 128 |
| network_weights | False |
| unet_lr | 0.0001 |
| text_encoder_lr | 5e-05 |
| optimizer_type | AdamW8bit |
| optimizer_args | False |
| learning_rate | 2e-06 |
| lr_scheduler | constant |
| lr_warmup_steps | 250 |
| lr_scheduler_args | 1 |
| keep_tokens | 1 |
| min_bucket_reso | 256 |
| max_bucket_reso | 1024 |
| resolution | 512 |
| caption_extension | .txt |
| noise_offset | 0 |
| prior_loss_weight | 1.0 |
| mixed_precision | fp16 |
| save_precision | fp16 |
| save_n_epochs_type | save_n_epoch_ratio |
| save_n_epochs_type_value | 3 |
| save_model_as | safetensors |
| train_batch_size | 4 |
| max_train_type | max_train_epochs |
| max_train_type_value | 20 |
| clip_skip | 2 |
| logging_dir | /content/training_dir/logs |
| additional_argument | --shuffle_caption --xformers |
+--------------------------+---------------------------------------------------------------+
2023-03-05 14:51:32.848337: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-03-05 14:51:33.505193: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-05 14:51:33.505334: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-05 14:51:33.505360: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-05 14:51:36.439065: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-03-05 14:51:37.065498: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-05 14:51:37.065609: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-05 14:51:37.065630: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
prepare tokenizer
update token length: 225
Use DreamBooth method.
prepare train images.
found directory 10_vestia_zeta contains 152 image files
1520 train images with repeating.
prepare reg images.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
loading image sizes.
100% 152/152 [00:00<00:00, 183.53it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (320, 704), count: 60
bucket 1: resolution (384, 640), count: 430
bucket 2: resolution (448, 576), count: 660
bucket 3: resolution (512, 512), count: 170
bucket 4: resolution (576, 448), count: 80
bucket 5: resolution (640, 384), count: 100
bucket 6: resolution (704, 320), count: 20
mean ar error (without repeats): 0.058514486511307695
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
Traceback (most recent call last):
File "/content/kohya-trainer/train_network.py", line 528, in
train(args)
File "/content/kohya-trainer/train_network.py", line 97, in train
textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype)
File "/content/kohya-trainer/library/train_util.py", line 1861, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "/content/kohya-trainer/library/model_util.py", line 880, in load_models_from_stable_diffusion_checkpoint
info = unet.load_state_dict(converted_unet_checkpoint)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/kohya-trainer/train_network.py', '--v2', '--v_parameterization', '--output_name=vestia_zeta', '--pretrained_model_name_or_path=/content/pretrained_model/anything-v3-fp32-pruned.safetensors', '--vae=/content/vae/anime.vae.pt', '--train_data_dir=/content/drive/MyDrive/LoRA/zeta', '--reg_data_dir=/content/drive/MyDrive/LoRA/reg_data', '--output_dir=/content/drive/MyDrive/training_dir/output', '--network_dim=128', '--network_alpha=128', '--network_module=networks.lora', '--unet_lr=0.0001', '--text_encoder_lr=5e-05', '--optimizer_type=AdamW8bit', '--learning_rate=2e-06', '--lr_scheduler=constant', '--lr_warmup_steps=250', '--resolution=512', '--enable_bucket', '--keep_tokens=1', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--caption_extension=.txt', '--cache_latents', '--prior_loss_weight=1.0', '--lowram', '--mixed_precision=fp16', '--save_precision=fp16', '--save_n_epoch_ratio=3', '--save_model_as=safetensors', '--train_batch_size=4', '--max_token_length=225', '--max_train_epochs=20', '--logging_dir=/content/training_dir/logs', '--log_prefix=vestia_zeta', '--shuffle_caption', '--xformers']' returned non-zero exit status 1.
The problem is v2 and v_parameterization set to True while the model is Anything v3.2 which is SD v1.x model. You need to uncheck both and it will works fine.
i managed to used my local gdrive as a directory for the dataset to train, prob missed on something but can't figuire it out
Reading package lists... Building dependency tree... Reading state information... aria2 is already the newest version (1.35.0-1build1). liblz4-tool is already the newest version (1.9.2-2ubuntu0.20.04.1). 0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded. Preparing metadata (setup.py) ... done Building wheel for library (setup.py) ... done ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.0/134.0 MB 8.5 MB/s eta 0:00:00 Download Progress Summary as of Sun Mar 5 14:50:08 2023
[#e5c326 2.5GiB/3.9GiB(63%) CN:16 DL:243MiB ETA:6s] FILE: /content/pretrained_model/anything-v3-fp32-pruned.safetensors
Download Results: gid |stat|avg speed |path/URI ======+====+===========+======================================================= e5c326|OK | 237MiB/s|/content/pretrained_model/anything-v3-fp32-pruned.safetensors
Status Legend: (OK):download completed.
Download Results: gid |stat|avg speed |path/URI ======+====+===========+======================================================= dc7d07|OK | 0B/s|/content/vae/anime.vae.pt
Status Legend: (OK):download completed. Skipping directory 10_vestia_zeta 2023-03-05 14:50:15.529319: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-05 14:50:16.593459: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2023-03-05 14:50:16.593633: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2023-03-05 14:50:16.593659: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. using existing wd14 tagger model found 0 images. loading model and labels 2023-03-05 14:50:23.731966: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. 2023-03-05 14:50:47.211466: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 37203968 exceeds 10% of free system memory. WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually. 0it [00:00, ?it/s] done! 2023-03-05 14:50:58.485195: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-05 14:50:59.246223: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-05 14:50:59.246355: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-05 14:50:59.246374: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. load images from /content/drive/MyDrive/LoRA/zeta found 0 images. loading BLIP caption: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth BLIP loaded 0it [00:00, ?it/s] done! --2023-03-05 14:51:31-- https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/v2-inference-v.yaml Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 416 Range Not Satisfiable
File successfully downloaded +--------------------------+---------------------------------------------------------------+ | Hyperparameter | Value | +--------------------------+---------------------------------------------------------------+ | mode | LoRA | | use_dreambooth_method | True | | lowram | True | | v2 | True | | v_parameterization | True | | project_name | vestia_zeta | | modelPath | /content/pretrained_model/anything-v3-fp32-pruned.safetensors | | vaePath | /content/vae/anime.vae.pt | | train_data_dir | /content/drive/MyDrive/LoRA/zeta | | reg_data_dir | /content/drive/MyDrive/LoRA/reg_data | | output_dir | /content/drive/MyDrive/training_dir/output | | network_dim | 128 | | network_alpha | 128 | | network_weights | False | | unet_lr | 0.0001 | | text_encoder_lr | 5e-05 | | optimizer_type | AdamW8bit | | optimizer_args | False | | learning_rate | 2e-06 | | lr_scheduler | constant | | lr_warmup_steps | 250 | | lr_scheduler_args | 1 | | keep_tokens | 1 | | min_bucket_reso | 256 | | max_bucket_reso | 1024 | | resolution | 512 | | caption_extension | .txt | | noise_offset | 0 | | prior_loss_weight | 1.0 | | mixed_precision | fp16 | | save_precision | fp16 | | save_n_epochs_type | save_n_epoch_ratio | | save_n_epochs_type_value | 3 | | save_model_as | safetensors | | train_batch_size | 4 | | max_train_type | max_train_epochs | | max_train_type_value | 20 | | clip_skip | 2 | | logging_dir | /content/training_dir/logs | | additional_argument | --shuffle_caption --xformers | +--------------------------+---------------------------------------------------------------+ 2023-03-05 14:51:32.848337: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-05 14:51:33.505193: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-05 14:51:33.505334: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-05 14:51:33.505360: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-03-05 14:51:36.439065: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-05 14:51:37.065498: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-05 14:51:37.065609: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-05 14:51:37.065630: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. prepare tokenizer update token length: 225 Use DreamBooth method. prepare train images. found directory 10_vestia_zeta contains 152 image files 1520 train images with repeating. prepare reg images. 0 reg images. no regularization images / 正則化画像が見つかりませんでした loading image sizes. 100% 152/152 [00:00<00:00, 183.53it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (320, 704), count: 60 bucket 1: resolution (384, 640), count: 430 bucket 2: resolution (448, 576), count: 660 bucket 3: resolution (512, 512), count: 170 bucket 4: resolution (576, 448), count: 80 bucket 5: resolution (640, 384), count: 100 bucket 6: resolution (704, 320), count: 20 mean ar error (without repeats): 0.058514486511307695 prepare accelerator Using accelerator 0.15.0 or above. load StableDiffusion checkpoint Traceback (most recent call last): File "/content/kohya-trainer/train_network.py", line 528, in
train(args)
File "/content/kohya-trainer/train_network.py", line 97, in train
textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype)
File "/content/kohya-trainer/library/train_util.py", line 1861, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "/content/kohya-trainer/library/model_util.py", line 880, in load_models_from_stable_diffusion_checkpoint
info = unet.load_state_dict(converted_unet_checkpoint)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/kohya-trainer/train_network.py', '--v2', '--v_parameterization', '--output_name=vestia_zeta', '--pretrained_model_name_or_path=/content/pretrained_model/anything-v3-fp32-pruned.safetensors', '--vae=/content/vae/anime.vae.pt', '--train_data_dir=/content/drive/MyDrive/LoRA/zeta', '--reg_data_dir=/content/drive/MyDrive/LoRA/reg_data', '--output_dir=/content/drive/MyDrive/training_dir/output', '--network_dim=128', '--network_alpha=128', '--network_module=networks.lora', '--unet_lr=0.0001', '--text_encoder_lr=5e-05', '--optimizer_type=AdamW8bit', '--learning_rate=2e-06', '--lr_scheduler=constant', '--lr_warmup_steps=250', '--resolution=512', '--enable_bucket', '--keep_tokens=1', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--caption_extension=.txt', '--cache_latents', '--prior_loss_weight=1.0', '--lowram', '--mixed_precision=fp16', '--save_precision=fp16', '--save_n_epoch_ratio=3', '--save_model_as=safetensors', '--train_batch_size=4', '--max_token_length=225', '--max_train_epochs=20', '--logging_dir=/content/training_dir/logs', '--log_prefix=vestia_zeta', '--shuffle_caption', '--xformers']' returned non-zero exit status 1.