Open Raz0rStorm opened 1 year ago
Also, when I tried again I got this: RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.6. Please reinstall the torchvision that matches your PyTorch install.
Hi, we already fixed it in the latest update, and also colab is fixed itself because of Torch 2.0.0 Update.
After the update, exception occurs in the "start training" section
2023-03-24 06:56:10.810637: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-24 06:56:11.594991: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:11.595108: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:11.595127: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-03-24 06:56:14.349071: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-24 06:56:15.093506: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:15.093620: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:15.093639: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading settings from /content/LoRA/config/config_file.toml... /content/LoRA/config/config_file prepare tokenizer Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 3.11MB/s] Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 2.09MB/s] Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 69.0kB/s] Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 175kB/s] update token length: 225 Load dataset config from /content/LoRA/config/dataset_config.toml prepare images. found directory /content/drive/MyDrive/MMKP contains 76 image files found directory /content/LoRA/reg_data contains 0 image files ignore subset with image_dir='/content/LoRA/reg_data': no images found / 画像が見つからないためサブセットを無視します 760 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 6 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: False
[Subset 0 of Dataset 0] image_dir: "/content/drive/MyDrive/MMKP" image_count: 76 num_repeats: 10 shuffle_caption: True keep_tokens: 1 caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.1 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False is_reg: False class_tokens: mmk mmk caption_extension: .txt
[Dataset 0] loading image sizes. 100% 76/76 [00:00<00:00, 627.80it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (320, 704), count: 10 bucket 1: resolution (384, 640), count: 100 bucket 2: resolution (448, 576), count: 450 bucket 3: resolution (512, 512), count: 70 bucket 4: resolution (576, 448), count: 110 bucket 5: resolution (640, 384), count: 10 bucket 6: resolution (704, 320), count: 10 mean ar error (without repeats): 0.07250056317355329 prepare accelerator Using accelerator 0.15.0 or above. load StableDiffusion checkpoint loading u-net:
loading vae:
Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 607kB/s]
Downloading (…)"pytorch_model.bin";: 100% 1.71G/1.71G [00:13<00:00, 126MB/s]
loading text encoder:
load VAE: /content/vae/anime.vae.pt
additional VAE loaded
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100% 76/76 [00:36<00:00, 2.10it/s]
import network module: lycoris.kohya
Using rank adaptation algo: lora
Apply different lora dim for conv layer
Conv Dim: 8, Linear Dim: 32
Use Dropout value: 0.0
Create LyCORIS Module
create LyCORIS for Text Encoder: 72 modules.
Create LyCORIS Module
create LyCORIS for U-Net: 278 modules.
enable LyCORIS for text encoder
enable LyCORIS for U-Net
prepare optimizer, data loader etc.
Traceback (most recent call last):
File "/content/kohya-trainer/train_network.py", line 693, in
train(args)
File "/content/kohya-trainer/train_network.py", line 183, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "/content/kohya-trainer/library/train_util.py", line 2151, in get_optimizer
value = ast.literal_eval(value)
File "/usr/lib/python3.9/ast.py", line 105, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python3.9/ast.py", line 104, in _convert
return _convert_signed_num(node)
File "/usr/lib/python3.9/ast.py", line 78, in _convert_signed_num
return _convert_num(node)
File "/usr/lib/python3.9/ast.py", line 69, in _convert_num
_raise_malformed_node(node)
File "/usr/lib/python3.9/ast.py", line 66, in _raise_malformed_node
raise ValueError(f'malformed node or string: {node!r}')
ValueError: malformed node or string: <ast.Name object at 0x7f362cccb8b0>
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.txt', '--dataset_config=/content/LoRA/config/dataset_config.toml', '--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1