bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

problem training lore #1795

Closed fubarac closed 8 months ago

fubarac commented 10 months ago

there is a problem, i don't know where exactly when i try to train lora i try different models and it still doesn't work. ill be glad if someone will find the problem.

`13:58:18-783361 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket
                         --min_bucket_reso=256 --max_bucket_reso=2048
                         --pretrained_model_name_or_path="C:/Users/j/Documents/a1111/stable-diffusion-webui/models/S
                         table-diffusion/realisticVisionV60B1_v60B1VAE.safetensors"
                         --train_data_dir="C:\Users\j\Documents\a1111\lora traning data\test\image"
                         --resolution="768,768" --output_dir="C:\Users\j\Documents\a1111\lora traning
                         data\test\model" --logging_dir="C:\Users\j\Documents\a1111\lora traning data\test\log"
                         --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora
                         --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="last"
                         --lr_scheduler_num_cycles="1" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine"
                         --lr_warmup_steps="140" --train_batch_size="2" --max_train_steps="1400"
                         --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --cache_latents
                         --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0"
                         --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory C:\Users\j\Documents\a1111\lora traning data\test\image\100_test contains 28 image files
No caption file found for 28 images. Training will continue without captions for these images. If class token exists, it will be used. / 28枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
C:\Users\j\Documents\a1111\lora traning data\test\image\100_test\357899421_822294102667661_2397216796649422802_n.jpg
C:\Users\j\Documents\a1111\lora traning data\test\image\100_test\358045145_1087556515553872_7328103429567946199_n.jpg
C:\Users\j\Documents\a1111\lora traning data\test\image\100_test\358053195_780436100540601_4810672643826577373_n.jpg
C:\Users\j\Documents\a1111\lora traning data\test\image\100_test\358961137_152298244537935_690518984658140002_n.jpg
C:\Users\j\Documents\a1111\lora traning data\test\image\100_test\359555336_1303571573857210_4374378250726296964_n.jpg
C:\Users\j\Documents\a1111\lora traning data\test\image\100_test\363510473_3509950062656315_7905028743974878263_n.jpg... and 23 more
2800 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (768, 768)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 2048
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "C:\Users\j\Documents\a1111\lora traning data\test\image\100_test"
    image_count: 28
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: test
    caption_extension: .caption

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 2410.87it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (128, 128), count: 100
bucket 1: resolution (192, 192), count: 200
bucket 2: resolution (192, 256), count: 200
bucket 3: resolution (256, 256), count: 100
bucket 4: resolution (320, 320), count: 100
bucket 5: resolution (384, 384), count: 100
bucket 6: resolution (448, 448), count: 100
bucket 7: resolution (640, 768), count: 500
bucket 8: resolution (640, 832), count: 800
bucket 9: resolution (704, 768), count: 400
bucket 10: resolution (768, 768), count: 100
bucket 11: resolution (832, 640), count: 100
mean ar error (without repeats): 0.03822719710816732
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/j/Documents/a1111/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV60B1_v60B1VAE.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Enable xformers for U-Net
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|██████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<?, ?it/s]
caching latents...
100%|██████████████████████████████████████████████████████████████████████████████████| 28/28 [00:05<00:00,  5.00it/s]
create LoRA network. base dim (rank): 8, alpha: 1.0
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
Traceback (most recent call last):
  File "C:\Users\j\Documents\a1111\kohya_ss\library\train_util.py", line 3444, in get_optimizer
    import bitsandbytes as bnb
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
    from .cuda_setup.main import evaluate_cuda_setup
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
    from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\j\Documents\a1111\kohya_ss\train_network.py", line 1012, in <module>
    trainer.train(args)
  File "C:\Users\j\Documents\a1111\kohya_ss\train_network.py", line 342, in train
    optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
  File "C:\Users\j\Documents\a1111\kohya_ss\library\train_util.py", line 3446, in get_optimizer
    raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\ja\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\j\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "C:\Users\j\Documents\a1111\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\j\\Documents\\a1111\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=C:/Users/j/Documents/a1111/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV60B1_v60B1VAE.safetensors', '--train_data_dir=C:\\Users\\j\\Documents\\a1111\\lora traning data\\test\\image', '--resolution=768,768', '--output_dir=C:\\Users\\j\\Documents\\a1111\\lora traning data\\test\\model', '--logging_dir=C:\\Users\\j\\Documents\\a1111\\lora traning data\\test\\log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=140', '--train_batch_size=2', '--max_train_steps=1400', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.`

thanks.

mertayd0 commented 10 months ago

any update?

FiveSaix commented 9 months ago

You're using adamw8bit

You need to install a older version of bitsandbytes to fix it.

Simply run the setup again. Dont worry this takes less than 30 secs

Click option 3. (optional) Install specific bitsandbytes versions.

Then select etiher 0.35.0 or 0.40.1

Then you'll be able to use adamw8bit.