Closed riskay99 closed 1 year ago
I meet the same issue.
22:28:50-254987 INFO Start training LoRA Standard ...
22:28:50-256577 INFO Valid image folder names found in: /share/images
22:28:50-258238 INFO Folder 100_zhouxun: 21 images found
22:28:50-259241 INFO Folder 100_zhouxun: 2100 steps
22:28:50-260217 INFO Total steps: 2100
22:28:50-261194 INFO Train batch size: 2
22:28:50-262145 INFO Gradient accumulation steps: 1
22:28:50-263090 INFO Epoch: 1
22:28:50-263995 INFO Regulatization factor: 1
22:28:50-264961 INFO max_train_steps (2100 / 2 / 1 1 1) = 1050
22:28:50-266098 INFO stop_text_encoder_training = 0
22:28:50-267022 INFO lr_warmup_steps = 0
22:28:50-268037 INFO accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/share/images"
--resolution="512,512" --output_dir="/share/models" --logging_dir="/share/logs" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05
--unet_lr=0.0001 --network_dim=128 --output_name="Addams" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2"
--max_train_steps="1050" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit"
--max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
2023-06-16 22:28:51.054437: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-16 22:28:51.237574: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-16 22:28:51.837600: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-16 22:28:51.837694: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-16 22:28:51.837713: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[22:28:53] WARNING The following values were not passed to accelerate launch
and had defaults used instead: launch.py:1088
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
2023-06-16 22:28:53.861223: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-16 22:28:54.043173: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-16 22:28:54.645775: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-16 22:28:54.645858: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-06-16 22:28:54.645878: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /kohya_ss/train_network.py:17 in
for me the issue was a broken conda install. reinstalling conda seemed to fix this
I still have a similar issue, but not with conda.
To create a public link, set share=True
in launch()
.
21:28:48-438984 INFO Start training LoRA Standard ...
21:28:48-439679 INFO Valid image folder names found in: /home/max/Pictures/LoRA/training_testing/LORA/image/
21:28:48-440361 INFO Folder 100_training_testing: 223 images found
21:28:48-440811 INFO Folder 100_training_testing: 22300 steps
21:28:48-441207 INFO Total steps: 22300
21:28:48-441579 INFO Train batch size: 2
21:28:48-441963 INFO Gradient accumulation steps: 1
21:28:48-442340 INFO Epoch: 1
21:28:48-442691 INFO Regulatization factor: 1
21:28:48-443091 INFO max_train_steps (22300 / 2 / 1 1 1) = 11150
21:28:48-443556 INFO stop_text_encoder_training = 0
21:28:48-443932 INFO lr_warmup_steps = 0
21:28:48-444350 INFO accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="/media/big_hhd/applications/stable-diffusion-webui/models/Stable-diffusion/AnythingV5_v5RE.ckpt"
--train_data_dir="/home/max/Pictures/LoRA/training_testing/LORA/image/" --resolution="768,768" --output_dir="/home/max/Pictures/LoRA/training_testing/LORA/model/ "
--/home/max/Pictures/LoRA/training_testing/LORA/log/" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128
--output_name="tests" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="11150" --save_every_n_epochs="1" --mixed_precision="bf16"
--save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
2023-06-23 21:28:49.900589: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2023-06-23 21:28:49.976795: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-23 21:28:50.326511: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[21:28:50] WARNING The following values were not passed to accelerate launch
and had defaults used instead: launch.py:890
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
2023-06-23 21:28:52.016254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory /home/max/Pictures/LoRA/training_testing/LORA/image/100_training_testing contains 223 image files
22300 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (768, 768)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: True
[Subset 0 of Dataset 0] image_dir: "/home/max/Pictures/LoRA/training_testing/LORA/image/100_training_testing" image_count: 223 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: testout caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 223/223 [00:00<00:00, 6681.31it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (192, 128), count: 100
bucket 1: resolution (192, 192), count: 300
bucket 2: resolution (256, 192), count: 500
bucket 3: resolution (256, 448), count: 100
bucket 4: resolution (320, 448), count: 100
bucket 5: resolution (320, 512), count: 100
bucket 6: resolution (448, 640), count: 100
bucket 7: resolution (448, 704), count: 100
bucket 8: resolution (448, 1152), count: 100
bucket 9: resolution (512, 640), count: 100
bucket 10: resolution (512, 768), count: 100
bucket 11: resolution (576, 384), count: 200
bucket 12: resolution (576, 448), count: 300
bucket 13: resolution (576, 640), count: 100
bucket 14: resolution (576, 704), count: 100
bucket 15: resolution (576, 832), count: 400
bucket 16: resolution (576, 896), count: 600
bucket 17: resolution (576, 960), count: 200
bucket 18: resolution (640, 448), count: 200
bucket 19: resolution (640, 512), count: 300
bucket 20: resolution (640, 576), count: 100
bucket 21: resolution (640, 768), count: 300
bucket 22: resolution (640, 832), count: 2400
bucket 23: resolution (640, 896), count: 3100
bucket 24: resolution (704, 320), count: 100
bucket 25: resolution (704, 512), count: 300
bucket 26: resolution (704, 704), count: 1100
bucket 27: resolution (704, 768), count: 600
bucket 28: resolution (704, 832), count: 100
bucket 29: resolution (768, 384), count: 200
bucket 30: resolution (768, 512), count: 500
bucket 31: resolution (768, 576), count: 1800
bucket 32: resolution (768, 640), count: 1000
bucket 33: resolution (768, 704), count: 100
bucket 34: resolution (768, 768), count: 1300
bucket 35: resolution (832, 448), count: 100
bucket 36: resolution (832, 576), count: 700
bucket 37: resolution (832, 640), count: 1900
bucket 38: resolution (832, 704), count: 300
bucket 39: resolution (896, 576), count: 500
bucket 40: resolution (896, 640), count: 1000
bucket 41: resolution (960, 576), count: 100
bucket 42: resolution (1024, 576), count: 600
mean ar error (without repeats): 0.02504783041525756
preparing accelerator
/media/big_hhd/applications/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py:258: FutureWarning: logging_dir
is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir
instead.
warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint: /media/big_hhd/applications/stable-diffusion-webui/models/Stable-diffusion/AnythingV5_v5RE.ckpt
loading u-net:
Looks to be because of some failure with AdamW8bit
optimizer or something like that
I had the same Issue and I changed from AdamW8bit to AdamW (picrel)
@riskay99 @caiyesd @maxerature
Tried reinstalling a few times, disabling xformers setting, reducing resolution back to 512,512, and following a few tips from a linux post I saw (#810 I believe it was) and a few other settings. Using ubuntu and I believe Ive got everything installed right but its possible I messed up somehow. heres my error log though and hopefully someone can help me place where I went wrong or how to fix it.
./gui.sh --listen 127.0.0.1 --server_port 7860 --inbrowser
10:00:33-766950 INFO nVidia toolkit detected
10:00:34-143288 INFO Torch 1.12.1+cu116
10:00:34-156045 INFO Torch backend: nVidia CUDA 11.6 cuDNN 8302
10:00:34-157644 INFO Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24214 Arch (8, 9) Cores 128
10:00:34-158386 INFO Verifying requirements
10:00:34-160147 INFO Installing package: diffusers[torch]==0.10.2
10:00:37-002950 INFO headless: False
10:00:37-005416 INFO Load CSS...
Running on local URL: http://127.0.0.1:7860
To create a public link, set
share=True
inlaunch()
. 10:01:18-606818 INFO Loading config...10:01:20-012801 INFO Loading config...
10:01:23-732054 INFO Start training Dreambooth...
10:01:23-734574 INFO Valid image folder names found in: /media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/image
10:01:23-737095 INFO Folder 100_avaluaca : steps 7100
10:01:23-739271 INFO max_train_steps = 7100
10:01:23-741063 INFO stop_text_encoder_training = 0
10:01:23-742909 INFO lr_warmup_steps = 0
10:01:23-744889 INFO accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
--train_data_dir="/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/image" --resolution="512,512"
--output_dir="/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/model" --logging_dir="/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/log"
--save_model_as=safetensors --output_name="avaluaca" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="7100" --save_every_n_epochs="1" --mixed_precision="bf16"
--save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --bucket_no_upscale
2023-06-15 10:01:24.219538: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-15 10:01:24.340521: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-06-15 10:01:24.729370: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-06-15 10:01:24.729414: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-06-15 10:01:24.729420: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. [10:01:25] WARNING The following values were not passed to
accelerate launch
and had defaults used instead: launch.py:1088--num_processes
was set to a value of1
--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or run
accelerate config
.2023-06-15 10:01:25.972767: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-15 10:01:26.093622: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-06-15 10:01:26.479471: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-06-15 10:01:26.479516: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-06-15 10:01:26.479522: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. prepare tokenizer prepare images. found directory /media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/image/100_avaluaca contains 71 image files 7100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True
[Subset 0 of Dataset 0] image_dir: "/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/image/100_avaluaca" image_count: 71 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: avaluaca caption_extension: .txt
[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:00<00:00, 12031.66it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (384, 576), count: 7100 mean ar error (without repeats): 0.014184397163120588 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Fetching 15 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 88737.04it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing
safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . [Dataset 0] caching latents. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:02<00:00, 25.86it/s] prepare optimizer, data loader etc.===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
/media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64')} warn( /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:105: UserWarning: /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/cv2/../../lib64: did not contain libcudart.so as expected! Searching further paths... warn( /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Seat0')} warn( /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1827,unix/d3'), PosixPath('local/d3')} warn( /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Session0')} warn( /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')} warn( CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn( WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! CUDA SETUP: Loading binary /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:48: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable. warn( use 8-bit AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 7100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 7100 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1 gradient ccumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 7100 steps: 0%| | 0/7100 [00:00<?, ?it/s] epoch 1/1 ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /media/sinco/keepblank2/software/workin/kohya_ss/train_db.py:482 in │
│ │
│ 479 │ args = parser.parse_args() │
│ 480 │ args = train_util.read_config_from_file(args, parser) │
│ 481 │ │
│ ❱ 482 │ train(args) │
│ 483 │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/train_db.py:346 in train │
│ │
│ 343 │ │ │ │ │ │ params_to_clip = unet.parameters() │
│ 344 │ │ │ │ │ accelerator.clip_gradnorm(params_to_clip, args.max_grad_norm) │
│ 345 │ │ │ │ │
│ ❱ 346 │ │ │ │ optimizer.step() │
│ 347 │ │ │ │ lr_scheduler.step() │
│ 348 │ │ │ │ optimizer.zero_grad(set_to_none=True) │
│ 349 │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/accelerate/op │
│ timizer.py:134 in step │
│ │
│ 131 │ │ │ │ xm.optimizer_step(self.optimizer, optimizer_args=optimizer_args) │
│ 132 │ │ │ elif self.scaler is not None: │
│ 133 │ │ │ │ scale_before = self.scaler.get_scale() │
│ ❱ 134 │ │ │ │ self.scaler.step(self.optimizer, closure) │
│ 135 │ │ │ │ self.scaler.update() │
│ 136 │ │ │ │ scale_after = self.scaler.get_scale() │
│ 137 │ │ │ │ # If we reduced the loss scale, it means the optimizer step was skipped │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/am │
│ p/grad_scaler.py:338 in step │
│ │
│ 335 │ │ │
│ 336 │ │ assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were rec │
│ 337 │ │ │
│ ❱ 338 │ │ retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs) │
│ 339 │ │ │
│ 340 │ │ optimizer_state["stage"] = OptState.STEPPED │
│ 341 │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/am │
│ p/grad_scaler.py:285 in _maybe_opt_step │
│ │
│ 282 │ def _maybe_opt_step(self, optimizer, optimizer_state, *args, *kwargs): │
│ 283 │ │ retval = None │
│ 284 │ │ if not sum(v.item() for v in optimizer_state["found_inf_per_device"].values()): │
│ ❱ 285 │ │ │ retval = optimizer.step(args, kwargs) │
│ 286 │ │ return retval │
│ 287 │ │
│ 288 │ def step(self, optimizer, *args, kwargs): │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/l │
│ r_scheduler.py:65 in wrapper │
│ │
│ 62 │ │ │ │ instance = instance_ref() │
│ 63 │ │ │ │ instance._step_count += 1 │
│ 64 │ │ │ │ wrapped = func.get(instance, cls) │
│ ❱ 65 │ │ │ │ return wrapped(*args, kwargs) │
│ 66 │ │ │ │
│ 67 │ │ │ # Note that the returned function here is no longer a bound method, │
│ 68 │ │ │ # so attributes like │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/accelerate/co │
│ mmands/accelerate_cli.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/accelerate/co │
│ mmands/launch.py:1104 in launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/accelerate/co │
│ mmands/launch.py:567 in simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/media/sinco/keepblank2/software/workin/kohya_ss/venv/bin/python', 'train_db.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5',
'--train_data_dir=/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/image', '--resolution=512,512', '--output_dir=/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/model',
'--logging_dir=/media/sinco/keepblank2/software/workin/stable-diffusion-webui/zzzzz/avaluaca/avaluaca_lora/log', '--save_model_as=safetensors', '--output_name=avaluaca', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant',
'--train_batch_size=1', '--max_train_steps=7100', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2',
'--bucket_reso_steps=64', '--bucket_no_upscale']' returned non-zero exit status 1.
__func__
and__self__
no longer exist. │ │ │ │ /media/sinco/keepblank2/software/workin/kohyass/venv/lib/python3.10/site-packages/torch/optim/o │ │ ptimizer.py:113 in wrapper │ │ │ │ 110 │ │ │ │ obj, * = args │ │ 111 │ │ │ │ profile_name = "Optimizer.step#{}.step".format(obj.class.name) │ │ 112 │ │ │ │ with torch.autograd.profiler.record_function(profile_name): │ │ ❱ 113 │ │ │ │ │ return func(*args, kwargs) │ │ 114 │ │ │ return wrapper │ │ 115 │ │ │ │ 116 │ │ hooked = getattr(self.class.step, "hooked", None) │ │ │ │ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/torch/autogra │ │ d/grad_mode.py:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/ │ │ optim/optimizer.py:265 in step │ │ │ │ 262 │ │ │ │ if len(state) == 0: │ │ 263 │ │ │ │ │ self.init_state(group, p, gindex, pindex) │ │ 264 │ │ │ │ │ │ ❱ 265 │ │ │ │ self.update_step(group, p, gindex, pindex) │ │ 266 │ │ │ │ 267 │ │ return loss │ │ 268 │ │ │ │ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/torch/autogra │ │ d/grad_mode.py:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, *kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(args, kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/ │ │ optim/optimizer.py:506 in update_step │ │ │ │ 503 │ │ │ state["max1"], state["new_max1"] = state["new_max1"], state["max1"] │ │ 504 │ │ │ state["max2"], state["new_max2"] = state["new_max2"], state["max2"] │ │ 505 │ │ elif state["state1"].dtype == torch.uint8 and config["block_wise"]: │ │ ❱ 506 │ │ │ F.optimizer_update_8bit_blockwise( │ │ 507 │ │ │ │ self.optimizer_name, │ │ 508 │ │ │ │ grad, │ │ 509 │ │ │ │ p, │ │ │ │ /media/sinco/keepblank2/software/workin/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/ │ │ functional.py:858 in optimizer_update_8bit_blockwise │ │ │ │ 855 ) -> None: │ │ 856 │ │ │ 857 │ if g.dtype == torch.float32 and state1.dtype == torch.uint8: │ │ ❱ 858 │ │ str2optimizer8bit_blockwise[optimizer_name][0]( │ │ 859 │ │ │ get_ptr(p), │ │ 860 │ │ │ get_ptr(g), │ │ 861 │ │ │ get_ptr(state1), │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ NameError: name 'str2optimizer8bit_blockwise' is not defined steps: 0%| | 0/7100 [00:00<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /media/sinco/keepblank2/software/workin/kohya_ss/venv/bin/accelerate:8 in