bmaltais / kohya_ss

Apache License 2.0
9.35k stars 1.21k forks source link

Error when trying to train lora. #1854

Closed wile1005 closed 4 months ago

wile1005 commented 8 months ago

I'm trying to train a new Lora, But every time i start the training i get an error. I have tried to reinstall khoya_ss but i still get the same error. I have no idea how to fix this and any help wound be greatly appreciated.

19:08:55-779608 INFO     nVidia toolkit detected
19:08:57-040892 INFO     Torch 2.0.1+cu118
19:08:57-081899 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700
19:08:57-083900 INFO     Torch detected GPU: NVIDIA GeForce RTX 2080 SUPER VRAM 8192 Arch (7, 5) Cores 48
19:08:57-085901 INFO     Verifying modules installation status from requirements_windows_torch2.txt...
19:08:57-087901 INFO     Verifying modules installation status from requirements.txt...
19:09:02-708488 INFO     headless: False
19:09:02-711489 INFO     Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
19:10:15-031784 INFO     Start training LoRA Standard ...
19:10:15-033785 INFO     Checking for duplicate image filenames in training data directory...
19:10:15-035785 INFO     Valid image folder names found in: U:/Program Files (x86)/Algodoo/java/LORA/supersatanson/image
19:10:15-036785 INFO     Folder 100_Supersatanson: 39 images found
19:10:15-037786 INFO     Folder 100_Supersatanson: 3900 steps
19:10:15-038786 INFO     Total steps: 3900
19:10:15-039786 INFO     Train batch size: 1
19:10:15-040785 INFO     Gradient accumulation steps: 1
19:10:15-040785 INFO     Epoch: 1
19:10:15-041787 INFO     Regulatization factor: 1
19:10:15-042787 INFO     max_train_steps (3900 / 1 / 1 * 1 * 1) = 3900
19:10:15-043787 INFO     stop_text_encoder_training = 0
19:10:15-045788 INFO     lr_warmup_steps = 390
19:10:15-046788 INFO     Saving training config to U:/Program Files
                         (x86)/Algodoo/java/LORA/supersatanson/model\last_20240108-191015.json...
19:10:15-048788 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket
                         --min_bucket_reso=256 --max_bucket_reso=2048
                         --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="U:/Program
                         Files (x86)/Algodoo/java/LORA/supersatanson/image" --resolution="512,512"
                         --output_dir="U:/Program Files (x86)/Algodoo/java/LORA/supersatanson/model"
                         --logging_dir="U:/Program Files (x86)/Algodoo/java/LORA/supersatanson/log" --network_alpha="1"
                         --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05
                         --unet_lr=0.0001 --network_dim=8 --output_name="last" --lr_scheduler_num_cycles="1"
                         --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="390"
                         --train_batch_size="1" --max_train_steps="3900" --save_every_n_epochs="1"
                         --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit"
                         --max_grad_norm="1" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers
                         --bucket_no_upscale --noise_offset=0.0
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson contains 39 image files
No caption file found for 39 images. Training will continue without captions for these images. If class token exists, it will be used. / 39枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson\img (1).jpg
U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson\img (10).jpg
U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson\img (11).jpg
U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson\img (12).jpg
U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson\img (13).jpg
U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson\img (14).jpg... and 34 more
3900 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 1
  resolution: (512, 512)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 2048
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "U:\Program Files (x86)\Algodoo\java\LORA\supersatanson\image\100_Supersatanson"
    image_count: 39
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    keep_tokens_separator:
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: Supersatanson
    caption_extension: .caption

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 2784.92it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (384, 512), count: 600
bucket 1: resolution (384, 576), count: 900
bucket 2: resolution (448, 512), count: 400
bucket 3: resolution (448, 576), count: 1100
bucket 4: resolution (512, 512), count: 900
mean ar error (without repeats): 0.015424380664355928
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00,  8.05it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
UNet2DConditionModel: 64, 8, 768, False, False
U-Net converted to original U-Net
Enable xformers for U-Net
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|██████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<?, ?it/s]
caching latents...
100%|██████████████████████████████████████████████████████████████████████████████████| 39/39 [00:07<00:00,  5.36it/s]
create LoRA network. base dim (rank): 8, alpha: 1.0
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
Traceback (most recent call last):
  File "U:\WindowsTMP\kohya_ss\library\train_util.py", line 3480, in get_optimizer
    import bitsandbytes as bnb
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
    from .cuda_setup.main import evaluate_cuda_setup
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
    from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "U:\WindowsTMP\kohya_ss\train_network.py", line 996, in <module>
    trainer.train(args)
  File "U:\WindowsTMP\kohya_ss\train_network.py", line 348, in train
    optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
  File "U:\WindowsTMP\kohya_ss\library\train_util.py", line 3482, in get_optimizer
    raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\viking\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\viking\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "U:\WindowsTMP\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "U:\WindowsTMP\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['U:\\WindowsTMP\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=U:/Program Files (x86)/Algodoo/java/LORA/supersatanson/image', '--resolution=512,512', '--output_dir=U:/Program Files (x86)/Algodoo/java/LORA/supersatanson/model', '--logging_dir=U:/Program Files (x86)/Algodoo/java/LORA/supersatanson/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=390', '--train_batch_size=1', '--max_train_steps=3900', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.
TheZaind commented 8 months ago

if youre on windows and want to use the adam8bit, change in the "requirements_windows_torch2.tx"t the "bitsandbytes==0.41.1 # no_verify" to "bitsandbytes-windows". Worked for me! it can be that you need to delete the bitsandbytes folder from the venv.

Sorkan72 commented 8 months ago

lifesaver