bmaltais / kohya_ss

Apache License 2.0
9.37k stars 1.21k forks source link

training error #1683

Closed jxhxgt closed 7 months ago

jxhxgt commented 10 months ago

If I choose 8bit related:Traceback (most recent call last): File "E:\kohya_ss\library\train_util.py", line 3419, in get_optimizer import bitsandbytes as bnb File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes__init.py", line 6, in from . import cuda_setup, utils, research File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes\research__init__.py", line 1, in from . import nn File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\init__.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "E:\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\kohya_ss\train_db.py", line 488, in train(args) File "E:\kohya_ss\traindb.py", line 171, in train , _, optimizer = train_util.get_optimizer(args, trainable_params) File "E:\kohya_ss\library\train_util.py", line 3421, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\hjx43\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\hjx43\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\kohya_ss\venv\Scripts\python.exe', './train_db.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--no_token_padding', '--weighted_captions', '--pretrained_model_name_or_path=E:/sd-webui-aki/sd-webui-aki-v4.2/models/Stable-diffusion/ACertainThing-half.ckpt', '--train_data_dir=E:/xlzl/tupian/jpg', '--reg_data_dir=E:/DAIXUNLIAN1/reg', '--resolution=384,384', '--output_dir=E:/sd-webui-aki/sd-webui-aki-v4.2/models/Stable-diffusion', '--logging_dir=E:/kohya_ss/logs', '--stop_text_encoder_training=16673', '--save_model_as=safetensors', '--full_bf16', '--vae=E:/sd-webui-aki/sd-webui-aki-v4.2/models/VAE/vae-ft-mse-840000-ema-pruned.safetensors', '--output_name=shiyan', '--lr_scheduler_num_cycles=2', '--max_token_length=225', '--max_data_loader_n_workers=10', '--learning_rate=3e-06', '--lr_scheduler=constant_with_warmup', '--lr_warmup_steps=1667', '--train_batch_size=8', '--max_train_steps=33345', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=PagedAdamW8bit', '--lr_scheduler_args', 'Ir_end=1e-6', '--max_data_loader_n_workers=10', '--max_token_length=225', '--clip_skip=2', '--keep_tokens=3', '--caption_dropout_every_n_epochs=2', '--caption_dropout_rate=0.1', '--vae_batch_size=2', '--bucket_reso_steps=64', '--v_pred_like_loss=0.5', '--save_every_n_steps=2', '--save_last_n_steps=3', '--save_last_n_steps_state=1', '--min_snr_gamma=5', '--save_state', '--mem_eff_attn', '--flip_aug', '--gradient_checkpointing', '--xformers', '--persistent_data_loader_workers', '--bucket_no_upscale', '--noise_offset=0.11', '--adaptive_noise_scale=0.099']' returned non-zero exit status 1.

If I don't select 8bit related, I can start training then: Traceback (most recent call last): File "E:\kohya_ss\train_db.py", line 488, in train(args) File "E:\kohya_ss\train_db.py", line 279, in train for step, batch in enumerate(train_dataloader): File "E:\kohya_ss\venv\lib\site-packages\accelerate\data_loader.py", line 384, in iter current_batch = next(dataloader_iter) File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1345, in _next_data return self._process_data(data) File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1371, in _process_data data.reraise() File "E:\kohya_ss\venv\lib\site-packages\torch_utils.py", line 644, in reraise raise exception AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data_utils\worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\kohya_ss\venv\lib\site-packages\torch\utils\data\dataset.py", line 243, in getitem return self.datasets[dataset_idx][sample_idx] File "E:\kohya_ss\library\train_util.py", line 1201, in getitem example["input_ids"] = self.tokenizer[0](captions, padding=True, truncation=True, return_tensors="pt").input_ids AttributeError: 'DreamBoothDataset' object has no attribute 'tokenizer'

noahjgreer commented 10 months ago

Yes! I am also receiving this same error. I have tried reinstalling kohya_ss, tried updating/reinstalling the scipy package, but both have not solved the problem. I also have checked twice, and the bitsandbytes package is installed. Not sure what's going on. I am running Windows.

Microsoft Windows [Version 10.0.20348.2031]
(c) Microsoft Corporation. All rights reserved.

F:\StableDiffusionProjects\kohya_ss>gui.bat
←[1;33m=============================================================
Modules installed outside the virtual environment were found.
This can cause issues. Please review the installed modules.

You can uninstall all local modules with:

←[1;34mdeactivate
pip freeze > uninstall.txt
pip uninstall -y -r uninstall.txt
←[1;33m=============================================================←[0m

11:09:20-452466 INFO     Version: v22.1.1

11:09:20-460473 INFO     nVidia toolkit detected
11:09:22-381283 INFO     Torch 2.0.1+cu118
11:09:22-407306 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700
11:09:22-410308 INFO     Torch detected GPU: NVIDIA GeForce RTX 2060 SUPER VRAM 8192 Arch (7, 5) Cores 34
11:09:22-411309 INFO     Verifying modules installation status from requirements_windows_torch2.txt...
11:09:22-414312 INFO     Verifying modules installation status from requirements.txt...
11:09:25-379018 INFO     headless: False
11:09:25-383022 INFO     Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
11:10:00-604283 INFO     Loading config...
11:10:01-791092 INFO     Loading config...
11:10:06-471806 INFO     Start training LoRA Standard ...
11:10:06-472806 INFO     Checking for duplicate image filenames in training data directory...
11:10:06-474808 INFO     Valid image folder names found in: F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/img11:10:06-477811 INFO     Valid image folder names found in: F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/reg11:10:06-479814 INFO     Folder 25_f1nni fox: 7 images found
11:10:06-481815 INFO     Folder 25_f1nni fox: 175 steps
11:10:06-483817 WARNING  Regularisation images are used... Will double the number of steps required...
11:10:06-484818 INFO     Total steps: 175
11:10:06-485819 INFO     Train batch size: 1
11:10:06-486820 INFO     Gradient accumulation steps: 1
11:10:06-488822 INFO     Epoch: 12
11:10:06-491825 INFO     Regulatization factor: 2
11:10:06-497830 INFO     max_train_steps (175 / 1 / 1 * 12 * 2) = 4200
11:10:06-502836 INFO     stop_text_encoder_training = 0
11:10:06-507839 INFO     lr_warmup_steps = 0
11:10:06-509842 INFO     Saving training config to
                         F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/model\lora_Finni-v1.1-NoAI_SD15_202311                         13-111006.json...
11:10:06-514846 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket
                         --min_bucket_reso=256 --max_bucket_reso=2048
                         --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
                         --train_data_dir="F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/img"
                         --reg_data_dir="F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/reg"
                         --resolution="512,512"
                         --output_dir="F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/model"
                         --logging_dir="F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/log"
                         --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --network_dim=8
                         --output_name="lora_Finni-v1.1-NoAI_SD15" --lr_scheduler_num_cycles="12" --no_half_vae
                         --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1"
                         --max_train_steps="4200" --save_every_n_epochs="1" --mixed_precision="fp16"
                         --save_precision="fp16" --seed="58008" --cache_latents --optimizer_type="AdamW8bit"
                         --max_token_length=150 --clip_skip=2 --bucket_reso_steps=64 --shuffle_caption --xformers
                         --bucket_no_upscale --noise_offset=0.0
prepare tokenizer
update token length: 150
Using DreamBooth method.
prepare images.
found directory F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox contains 7 image files
No caption file found for 7 images. Training will continue without captions for these images. If class token exists, it will be used. / 7枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox\00000-0-image-000.png
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox\00001-0-image-001.png
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox\00002-0-image-002.png
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox\00003-0-image-003.png
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox\00004-0-image-004.png
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox\00005-0-image-005.png... and 2 more
found directory F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox contains 200 image files
No caption file found for 200 images. Training will continue without captions for these images. If class token exists, it will be used. / 200枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox\d05351a88695810ce59ddca92080cd37.jpg
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox\d131cb2277d8e370ab5a04183ded0bb6.jpg
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox\d14dfc8b13ed925db973047c3828de90.jpg
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox\d15ac65a8720ebc664039d2681266782.png
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox\d15ba6e15ff454ba64b3d51af21e5de8.jpg
F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox\d16ec70b2151c66fa5c87162937cda93.png... and 195 more
175 train images with repeating.
200 reg images.
some of reg images are not used / 正則化画像の数が多いので、一部使用されない正則化画像があります
[Dataset 0]
  batch_size: 1
  resolution: (512, 512)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 2048
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\img\25_f1nni fox"
    image_count: 7
    num_repeats: 25
    shuffle_caption: True
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: f1nni fox
    caption_extension: .caption

  [Subset 1 of Dataset 0]
    image_dir: "F:\StableDiffusionProjects\LoRA\Character\Finni-v1.1-NAI\reg\1_fox"
    image_count: 200
    num_repeats: 1
    shuffle_caption: True
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: True
    class_tokens: fox
    caption_extension: .caption

[Dataset 0]
loading image sizes.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 5865.62it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (320, 320), count: 1
bucket 1: resolution (448, 448), count: 5
bucket 2: resolution (512, 512), count: 319
bucket 3: resolution (576, 384), count: 25
mean ar error (without repeats): 0.0
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5
text_encoder\model.safetensors not found
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.26it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
UNet2DConditionModel: 64, 8, 768, False, False
U-Net converted to original U-Net
Enable xformers for U-Net
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 181926.44it/s]
caching latents...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:51<00:00,  3.52it/s]
create LoRA network. base dim (rank): 8, alpha: 1.0
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
Traceback (most recent call last):
  File "F:\StableDiffusionProjects\kohya_ss\library\train_util.py", line 3419, in get_optimizer
    import bitsandbytes as bnb
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
    from .cuda_setup.main import evaluate_cuda_setup
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
    from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\StableDiffusionProjects\kohya_ss\train_network.py", line 1009, in <module>
    trainer.train(args)
  File "F:\StableDiffusionProjects\kohya_ss\train_network.py", line 338, in train
    optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
  File "F:\StableDiffusionProjects\kohya_ss\library\train_util.py", line 3421, in get_optimizer
    raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "F:\StableDiffusionProjects\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "F:\StableDiffusionProjects\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\\StableDiffusionProjects\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/img', '--reg_data_dir=F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/reg', '--resolution=512,512', '--output_dir=F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/model', '--logging_dir=F:/StableDiffusionProjects/LoRA/Character/Finni-v1.1-NAI/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=8', '--output_name=lora_Finni-v1.1-NoAI_SD15', '--lr_scheduler_num_cycles=12', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=4200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=58008', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_token_length=150', '--clip_skip=2', '--bucket_reso_steps=64', '--shuffle_caption', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.
Keyboard interruption in main thread... closing server.
Terminate batch job (Y/N)? y

(venv) F:\StableDiffusionProjects\kohya_ss>pip list
Package                      Version      Editable project location
---------------------------- ------------ -----------------------------------
absl-py                      2.0.0
accelerate                   0.23.0
aiofiles                     23.2.1
aiohttp                      3.8.6
aiosignal                    1.3.1
altair                       4.2.2
annotated-types              0.6.0
anyio                        3.7.1
appdirs                      1.4.4
astunparse                   1.6.3
async-timeout                4.0.3
attrs                        23.1.0
bitsandbytes                 0.41.1
cachetools                   5.3.2
certifi                      2022.12.7
charset-normalizer           2.1.1
click                        8.1.7
colorama                     0.4.6
coloredlogs                  15.0.1
contourpy                    1.2.0
cycler                       0.12.1
dadaptation                  3.1
diffusers                    0.21.4
docker-pycreds               0.4.0
easygui                      0.98.3
einops                       0.6.0
entrypoints                  0.4
exceptiongroup               1.1.3
fairscale                    0.4.13
fastapi                      0.104.1
ffmpy                        0.3.1
filelock                     3.9.0
flatbuffers                  23.5.26
fonttools                    4.44.0
frozenlist                   1.4.0
fsspec                       2023.10.0
ftfy                         6.1.1
gast                         0.5.4
gitdb                        4.0.11
GitPython                    3.1.40
google-auth                  2.23.4
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
gradio                       3.36.1
gradio_client                0.7.0
grpcio                       1.59.2
h11                          0.14.0
h5py                         3.10.0
httpcore                     1.0.2
httpx                        0.25.1
huggingface-hub              0.15.1
humanfriendly                10.0
idna                         3.4
importlib-metadata           6.8.0
invisible-watermark          0.2.0
Jinja2                       3.1.2
jsonschema                   4.19.2
jsonschema-specifications    2023.7.1
keras                        2.14.0
kiwisolver                   1.4.5
libclang                     16.0.6
library                      1.0.3        F:\StableDiffusionProjects\kohya_ss
lightning-utilities          0.9.0
linkify-it-py                2.0.2
lion-pytorch                 0.0.6
lycoris-lora                 1.9.0
Markdown                     3.5.1
markdown-it-py               2.2.0
MarkupSafe                   2.1.3
matplotlib                   3.8.1
mdit-py-plugins              0.3.3
mdurl                        0.1.2
ml-dtypes                    0.2.0
mpmath                       1.3.0
multidict                    6.0.4
networkx                     3.0
numpy                        1.24.1
oauthlib                     3.2.2
onnx                         1.14.1
onnxruntime-gpu              1.16.0
open-clip-torch              2.20.0
opencv-python                4.7.0.68
opt-einsum                   3.3.0
orjson                       3.9.10
packaging                    23.2
pandas                       2.1.3
pathtools                    0.1.2
Pillow                       9.3.0
pip                          22.2.1
prodigyopt                   1.0
protobuf                     3.20.3
psutil                       5.9.6
pyasn1                       0.5.0
pyasn1-modules               0.3.0
pydantic                     2.5.0
pydantic_core                2.14.1
pydub                        0.25.1
Pygments                     2.16.1
pyparsing                    3.1.1
pyreadline3                  3.4.1
python-dateutil              2.8.2
python-multipart             0.0.6
pytorch-lightning            1.9.0
pytz                         2023.3.post1
PyWavelets                   1.4.1
PyYAML                       6.0.1
referencing                  0.30.2
regex                        2023.10.3
requests                     2.28.1
requests-oauthlib            1.3.1
rich                         13.4.1
rpds-py                      0.12.0
rsa                          4.9
safetensors                  0.3.1
scipy                        1.11.3
semantic-version             2.10.0
sentencepiece                0.1.99
sentry-sdk                   1.35.0
setproctitle                 1.3.3
setuptools                   63.2.0
six                          1.16.0
smmap                        5.0.1
sniffio                      1.3.0
starlette                    0.27.0
sympy                        1.12
tensorboard                  2.14.1
tensorboard-data-server      0.7.2
tensorflow                   2.14.0
tensorflow-estimator         2.14.0
tensorflow-intel             2.14.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor                    2.3.0
timm                         0.6.12
tk                           0.1.0
tokenizers                   0.13.3
toml                         0.10.2
toolz                        0.12.0
torch                        2.0.1+cu118
torchmetrics                 1.2.0
torchvision                  0.15.2+cu118
tqdm                         4.66.1
transformers                 4.30.2
typing_extensions            4.8.0
tzdata                       2023.3
uc-micro-py                  1.0.2
urllib3                      1.26.13
uvicorn                      0.24.0.post1
voluptuous                   0.13.1
wandb                        0.15.11
wcwidth                      0.2.9
websockets                   11.0.3
Werkzeug                     3.0.1
wheel                        0.41.3
wrapt                        1.14.1
xformers                     0.0.21
yarl                         1.9.2
zipp                         3.17.0

[notice] A new release of pip available: 22.2.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

(venv) F:\StableDiffusionProjects\kohya_ss>
asbjbo commented 10 months ago

Same here, on WIndows 10 with Python 3.10.9. I've tried installing, deinstalling, and reinstalling various component in different orders, but cannot get past the same point shown above, after message "prepare optimizer, data loader etc.", error message ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

From my installation script: Kohya_ss GUI setup menu:

  1. Install kohya_ss gui
  2. (Optional) Install cudann files (avoid unless you really need it)
  3. (Optional) Install specific bitsandbytes versions
  4. (Optional) Manually configure accelerate
  5. (Optional) Start Kohya_ss GUI in browser
  6. Quit

Enter your choice: 1

23:29:26-920219 INFO Version: v22.1.1

23:29:26-927242 INFO Python 3.10.9 on Windows 23:29:26-935234 INFO Installing modules from requirements_windows_torch2.txt... 23:29:26-938227 WARNING Package wrong version: torch 2.1.0+cu118 required 2.0.1+cu118 23:29:26-941198 INFO Installing package: torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118 23:30:58-542710 WARNING Package wrong version: xformers 0.0.22.post7+cu118 required 0.0.21 08:10:55-243242 INFO Installing package: xformers==0.0.21 08:11:10-870276 INFO Installing package: bitsandbytes==0.41.1 08:11:13-625550 INFO Installing modules from requirements.txt... 08:11:13-646769 INFO Installing package: -e . 08:11:26-072388 INFO Copying bitsandbytes files... 08:11:26-078382 INFO Configuring accelerate...

From the runtime script: ←[1;33m============================================================= Modules installed outside the virtual environment were found. This can cause issues. Please review the installed modules.

You can uninstall all local modules with:

←[1;34mdeactivate pip freeze > uninstall.txt pip uninstall -y -r uninstall.txt ←[1;33m=============================================================←[0m

08:19:32-320080 INFO Version: v22.1.1

08:19:32-329498 INFO nVidia toolkit detected 08:19:34-588640 INFO Torch 2.0.1+cu118 08:19:34-617613 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 08:19:34-621578 INFO Torch detected GPU: NVIDIA GeForce GTX 1080 Ti VRAM 11264 Arch (6, 1) Cores 28 08:19:34-624573 INFO Verifying modules installation status from requirements_windows_torch2.txt... 08:19:34-634594 INFO Verifying modules installation status from requirements.txt... 08:19:39-838387 INFO headless: False 08:19:39-848382 INFO Load CSS... Running on local URL: http://127.0.0.1:7861

To create a public link, set share=True in launch().

08:24:28-617621 INFO Removing existing directory C:/Users/asbjo/StableDiffusion/AMB\img/100_AMB man... 08:24:28-664564 INFO Copy C:/Users/asbjo/StableDiffusion/AMB/100_AMB to C:/Users/asbjo/StableDiffusion/AMB\img/100_AMB man... 08:24:28-805425 INFO Removing existing directory C:/Users/asbjo/StableDiffusion/AMB\reg/1_man... 08:24:28-807420 INFO Copy C:/Users/asbjo/StableDiffusion/AMB/reg to C:/Users/asbjo/StableDiffusion/AMB\reg/1_man... 08:24:28-810420 INFO Done creating kohya_ss training folder structure at C:/Users/asbjo/StableDiffusion/AMB... 08:25:05-528852 INFO Start training LoRA Standard ... 08:25:05-530879 INFO Checking for duplicate image filenames in training data directory... 08:25:05-536874 INFO Valid image folder names found in: C:/Users/asbjo/StableDiffusion/AMB\img 08:25:05-538840 INFO Valid image folder names found in: C:/Users/asbjo/StableDiffusion/AMB\reg 08:25:05-540838 INFO Folder 100_AMB man: 36 images found 08:25:05-544835 INFO Folder 100_AMB man: 3600 steps 08:25:05-546850 INFO Folder 40_AMB man: 36 images found 08:25:05-548831 INFO Folder 40_AMB man: 1440 steps 08:25:05-550828 WARNING Regularisation images are used... Will double the number of steps required... 08:25:05-552826 INFO Total steps: 5040 08:25:05-553839 INFO Train batch size: 1 08:25:05-555823 INFO Gradient accumulation steps: 1 08:25:05-557851 INFO Epoch: 1 08:25:05-562815 INFO Regulatization factor: 2 08:25:05-564844 INFO max_train_steps (5040 / 1 / 1 1 2) = 10080 08:25:05-567811 INFO stop_text_encoder_training = 0 08:25:05-568840 INFO lr_warmup_steps = 1008 08:25:05-570838 INFO Saving training config to C:/Users/asbjo/StableDiffusion/AMB\model\amb v.0.1_20231114-082505.json... 08:25:05-573841 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="C:/Users/asbjo/StableDiffusion/AMB\img" --reg_data_dir="C:/Users/asbjo/StableDiffusion/AMB\reg" --resolution="512,512" --output_dir="C:/Users/asbjo/StableDiffusion/AMB\model" --logging_dir="C:/Users/asbjo/StableDiffusion/AMB\log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="amb v.0.1" --lr_scheduler_num_cycles="1" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="1008" --train_batch_size="1" --max_train_steps="10080" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer Using DreamBooth method. prepare images. found directory C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man contains 36 image files No caption file found for 36 images. Training will continue without captions for these images. If class token exists, it will be used. / 36枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。 C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man\DCP_3917.JPG C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man\IMG_0179.JPG C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man\IMG_0286.JPG C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man\IMG_1072.JPG C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man\IMG_1087.JPG C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man\IMG_1913.JPG... and 31 more found directory C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man contains 36 image files No caption file found for 36 images. Training will continue without captions for these images. If class token exists, it will be used. / 36枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。 C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man\DCP_3917.JPG C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man\IMG_0179.JPG C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man\IMG_0286.JPG C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man\IMG_1072.JPG C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man\IMG_1087.JPG C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man\IMG_1913.JPG... and 31 more found directory C:\Users\asbjo\StableDiffusion\AMB\reg\1_man contains 0 image files ignore subset with image_dir='C:\Users\asbjo\StableDiffusion\AMB\reg\1_man': no images found / 画像が見つからないためサ ブセットを無視します 5040 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "C:\Users\asbjo\StableDiffusion\AMB\img\100_AMB man" image_count: 36 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: AMB man caption_extension: .caption

[Subset 1 of Dataset 0] image_dir: "C:\Users\asbjo\StableDiffusion\AMB\img\40_AMB man" image_count: 36 num_repeats: 40 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: AMB man caption_extension: .caption

[Dataset 0] loading image sizes. 100%|█████████████████████████████████████████████████████████████████████████████████| 72/72 [00:00<00:00, 242.78it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (256, 640), count: 280 bucket 1: resolution (256, 704), count: 140 bucket 2: resolution (256, 960), count: 140 bucket 3: resolution (320, 640), count: 280 bucket 4: resolution (320, 704), count: 840 bucket 5: resolution (320, 768), count: 140 bucket 6: resolution (384, 512), count: 280 bucket 7: resolution (384, 576), count: 1120 bucket 8: resolution (384, 640), count: 560 bucket 9: resolution (448, 448), count: 140 bucket 10: resolution (448, 512), count: 700 bucket 11: resolution (512, 384), count: 140 bucket 12: resolution (512, 448), count: 280 mean ar error (without repeats): 0.018508847976521484 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 vae\diffusion_pytorch_model.safetensors not found Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:04<00:00, 1.09it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|██████████████████████████████████████████████████████████████████████████████████████████| 72/72 [00:00<?, ?it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████████| 72/72 [00:13<00:00, 5.46it/s] create LoRA network. base dim (rank): 8, alpha: 1.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "C:\Users\asbjo\StableDiffusion\kohya_ss\library\train_util.py", line 3419, in get_optimizer import bitsandbytes as bnb File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes__init.py", line 6, in from . import cuda_setup, utils, research File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\research__init__.py", line 1, in from . import nn File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\init__.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\asbjo\StableDiffusion\kohya_ss\train_network.py", line 1009, in trainer.train(args) File "C:\Users\asbjo\StableDiffusion\kohya_ss\train_network.py", line 338, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "C:\Users\asbjo\StableDiffusion\kohya_ss\library\train_util.py", line 3421, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\asbjo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\asbjo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\asbjo\StableDiffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\asbjo\StableDiffusion\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/asbjo/StableDiffusion/AMB\img', '--reg_data_dir=C:/Users/asbjo/StableDiffusion/AMB\reg', '--resolution=512,512', '--output_dir=C:/Users/asbjo/StableDiffusion/AMB\model', '--logging_dir=C:/Users/asbjo/StableDiffusion/AMB\log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=amb v.0.1', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=1008', '--train_batch_size=1', '--max_train_steps=10080', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

asbjbo commented 10 months ago

Update: By changing optimizer type from AdamW8bit to Adafactor, I was able to get it to run. Still running, have very limited idea if the results will be usable or not, since it was not possible to load the corresponding parameter presets without triggering another bug, #1677 here.

noahjgreer commented 10 months ago

Thanks for the advice! I appreciate it! I will give that a try as well.

yeswecan commented 10 months ago

@asbjbo switching to Adafactor helped me as well. Still curious what might be the source of the bug, something that has to do with the memory allocation I suppose. Maybe it's worth it to try to remove Accelerate to see if it fixes that. I can't think of other possible sources of the issue.

Fiendstar77 commented 9 months ago

for me i had to change to AdamW ... i have been trouble shooting that baby all damn day lol but seems i might finally had it working

IvanDart1001 commented 9 months ago

For me, I met a similiar error said "ModuleNotFoundError: No module named 'bitsandbytes" along with another error said "ModuleNotFoundError: No module named 'scipy'"

First of all I have add cuda library path like:

export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Then I mannually installed bitsandbytes #I'm using python3.9

git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x
python3 setup.py install

And add scipy to 'requirements.txt' like:

accelerate==0.23.0
scipy
# albumentations==1.3.0
aiofiles==23.2.1
altair==4.2.2
dadaptation==3.1
...

Finally, reinstall kohya_ss by running setup.sh, the python venv will have scipy installed. And Adam8bit is functional normally.

ZikViM commented 9 months ago

1704