Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.84k stars 303 forks source link

help with one click trainer #142

Open BroctorDF opened 1 year ago

BroctorDF commented 1 year ago

Hi, I was trying to create a LORA but I keep getting this error

2023-03-20 04:40:56.630426: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-20 04:40:57.674789: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-20 04:40:57.675017: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-20 04:40:57.675048: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-03-20 04:41:00.803315: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-20 04:41:01.503107: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-20 04:41:01.503212: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-20 04:41:01.503248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. prepare tokenizer update token length: 225 Train with captions. loading existing metadata: /content/drive/MyDrive/training_dir/meta_lat.json Traceback (most recent call last): File "/content/kohya-trainer/train_network.py", line 663, in train(args) File "/content/kohya-trainer/train_network.py", line 94, in train train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) File "/content/kohya-trainer/library/config_util.py", line 368, in generate_dataset_group_by_blueprint dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params)) File "/content/kohya-trainer/library/train_util.py", line 911, in init assert len(abs_path) >= 1, f"no image / 画像がありません: {image_key}" AssertionError: no image / 画像がありません: 10 Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command simple_launcher(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/kohya-trainer/train_network.py', '--output_name=adrcl', '--pretrained_model_name_or_path=/content/pretrained_model/anything-v3-fp32-pruned.safetensors', '--vae=/content/vae/anime.vae.pt', '--train_data_dir=/content/drive/MyDrive/ADR/5_adrcll', '--in_json=/content/drive/MyDrive/training_dir/meta_lat.json', '--output_dir=/content/drive/MyDrive/training_dir/output', '--network_dim=128', '--network_alpha=128', '--network_module=networks.lora', '--unet_lr=0.0001', '--text_encoder_lr=5e-05', '--optimizer_type=AdamW8bit', '--learning_rate=2e-06', '--lr_scheduler=constant', '--lr_warmup_steps=250', '--dataset_repeats=10', '--resolution=512', '--keep_tokens=1', '--lowram', '--mixed_precision=fp16', '--save_precision=fp16', '--save_n_epoch_ratio=3', '--save_model_as=safetensors', '--train_batch_size=4', '--max_token_length=225', '--max_train_epochs=20', '--clip_skip=2', '--logging_dir=/content/training_dir/logs', '--log_prefix=adrcl', '--shuffle_caption', '--xformers']' returned non-zero exit status 1.

how do I solve it?... I'm not a programmer so, please I'd apreciate if you could explin like if I were a 5 yo.

Linaqruf commented 1 year ago

I'm sorry, I don't have any plans to continue the Fast Kohya Trainer project (for now).

  1. The code is getting longer in one cell, and my device is not powerful enough to handle it, (I run most of things in small laptop), which is causing lag.

  2. It's really hard to maintain the 1-click cell colab.

  3. There is another good 1-click cell colab project maintained by HollowStrawBerry, which is integrated with Google Drive. It based on my notebook (?) so you may get the same experience, or even better.You can check it out here: https://github.com/hollowstrawberry/kohya-colab