Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.84k stars 304 forks source link

Training Error Kohya LoRA Dreambooth: ValueError: malformed node or string: <ast.Name object at 0x7f362cccb8b0> #154

Open Raz0rStorm opened 1 year ago

Raz0rStorm commented 1 year ago

After the update, exception occurs in the "start training" section

2023-03-24 06:56:10.810637: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-24 06:56:11.594991: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:11.595108: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:11.595127: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-03-24 06:56:14.349071: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-24 06:56:15.093506: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:15.093620: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-03-24 06:56:15.093639: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading settings from /content/LoRA/config/config_file.toml... /content/LoRA/config/config_file prepare tokenizer Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 3.11MB/s] Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 2.09MB/s] Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 69.0kB/s] Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 175kB/s] update token length: 225 Load dataset config from /content/LoRA/config/dataset_config.toml prepare images. found directory /content/drive/MyDrive/MMKP contains 76 image files found directory /content/LoRA/reg_data contains 0 image files ignore subset with image_dir='/content/LoRA/reg_data': no images found / 画像が見つからないためサブセットを無視します 760 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 6 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "/content/drive/MyDrive/MMKP" image_count: 76 num_repeats: 10 shuffle_caption: True keep_tokens: 1 caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.1 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False is_reg: False class_tokens: mmk mmk caption_extension: .txt

[Dataset 0] loading image sizes. 100% 76/76 [00:00<00:00, 627.80it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (320, 704), count: 10 bucket 1: resolution (384, 640), count: 100 bucket 2: resolution (448, 576), count: 450 bucket 3: resolution (512, 512), count: 70 bucket 4: resolution (576, 448), count: 110 bucket 5: resolution (640, 384), count: 10 bucket 6: resolution (704, 320), count: 10 mean ar error (without repeats): 0.07250056317355329 prepare accelerator Using accelerator 0.15.0 or above. load StableDiffusion checkpoint loading u-net: loading vae: Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 607kB/s] Downloading (…)"pytorch_model.bin";: 100% 1.71G/1.71G [00:13<00:00, 126MB/s] loading text encoder: load VAE: /content/vae/anime.vae.pt additional VAE loaded Replace CrossAttention.forward to use xformers [Dataset 0] caching latents. 100% 76/76 [00:36<00:00, 2.10it/s] import network module: lycoris.kohya Using rank adaptation algo: lora Apply different lora dim for conv layer Conv Dim: 8, Linear Dim: 32 Use Dropout value: 0.0 Create LyCORIS Module create LyCORIS for Text Encoder: 72 modules. Create LyCORIS Module create LyCORIS for U-Net: 278 modules. enable LyCORIS for text encoder enable LyCORIS for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "/content/kohya-trainer/train_network.py", line 693, in train(args) File "/content/kohya-trainer/train_network.py", line 183, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "/content/kohya-trainer/library/train_util.py", line 2151, in get_optimizer value = ast.literal_eval(value) File "/usr/lib/python3.9/ast.py", line 105, in literal_eval return _convert(node_or_string) File "/usr/lib/python3.9/ast.py", line 104, in _convert return _convert_signed_num(node) File "/usr/lib/python3.9/ast.py", line 78, in _convert_signed_num return _convert_num(node) File "/usr/lib/python3.9/ast.py", line 69, in _convert_num _raise_malformed_node(node) File "/usr/lib/python3.9/ast.py", line 66, in _raise_malformed_node raise ValueError(f'malformed node or string: {node!r}') ValueError: malformed node or string: <ast.Name object at 0x7f362cccb8b0> Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command simple_launcher(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.txt', '--dataset_config=/content/LoRA/config/dataset_config.toml', '--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1

Raz0rStorm commented 1 year ago

Also, when I tried again I got this: RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.6. Please reinstall the torchvision that matches your PyTorch install.

Linaqruf commented 1 year ago

Hi, we already fixed it in the latest update, and also colab is fixed itself because of Torch 2.0.0 Update.