bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

Major training issues #1692

Closed davidfunk13 closed 8 months ago

davidfunk13 commented 11 months ago

This has been posted further down the chain but has no traction or responses.

Have been getting a lot of this attempting to train.

prepare optimizer, data loader etc.
Traceback (most recent call last):
  File "C:\Program Files\kohya_ss\library\train_util.py", line 3433, in get_optimizer
    import bitsandbytes as bnb
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
    from .cuda_setup.main import evaluate_cuda_setup
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
    from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\kohya_ss\train_network.py", line 1012, in <module>
    trainer.train(args)
  File "C:\Program Files\kohya_ss\train_network.py", line 342, in train
    optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
  File "C:\Program Files\kohya_ss\library\train_util.py", line 3435, in get_optimizer
    raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\Dave\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Dave\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "C:\Program Files\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, c

I can get it to run with adafactor with a 0 LR warmup, but can't really get it going in any other configuration

TeKett commented 11 months ago

Can confirm, exact same issue. I downloaded v22.2.1 as a completely new setup today, since i wanted to test Lion, which was not working (due to older bitsandbytes version) on my current setup. Cant confirm nor do i have the time too if its just the gui or not since i dont use the non-gui version, so maybe we should crosspost this issue to the main repo (https://github.com/kohya-ss/sd-scripts).

07:29:40-427896 INFO     Start Finetuning...
07:29:40-522670 INFO     image_num = 16680
07:29:40-523640 INFO     repeats = 16680
07:29:40-525635 INFO     max_train_steps = 333600
07:29:40-526661 INFO     lr_warmup_steps = 0
07:29:40-527629 INFO     Saving training config to C:/Users/user/stable-diffusion
                         /Train/checkpoint/model\checkpoint_20231117-072940.json...
07:29:40-529624 INFO     accelerate launch --num_cpu_threads_per_process=1 "./fine_tune.py" --train_text_encoder
                         --learning_rate_te="1e-05"
                         --pretrained_model_name_or_path="C:/Users/user/stable-diffusion-webui/models/Stable-diffusion
                         /#New folder/checkpoint-000018.safetensors" --in_json="C:/Users/user/stable-diffusion
                         /Train/checkpoint/config/meta_lat.json" --train_data_dir="C:/Users/user/stable-diffusion
                         /Train/checkpoint/img/checkpoint" --output_dir="C:/Users/user/stable-diffusion
                         /Train/checkpoint/model" --logging_dir="C:/Users/user/stable-diffusion /Train/checkpoint/log"
                         --dataset_repeats=1 --enable_bucket --resolution="512,768" --min_bucket_reso=256
                         --max_bucket_reso=1024 --save_model_as=safetensors --output_name="checkpoint"
                         --max_token_length=225 --learning_rate="1e-05" --lr_scheduler="cosine" --train_batch_size="1"
                         --max_train_steps="333600" --save_every_n_epochs="1" --mixed_precision="bf16"
                         --save_precision="bf16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk
                         --optimizer_type="Lion8bit" --max_data_loader_n_workers="0" --max_token_length=225
                         --bucket_reso_steps=8 --min_timestep=500 --max_timestep=650 --xformers --noise_offset=0.0
prepare tokenizer
update token length: 225
loading existing metadata: C:/Users/user/stable-diffusion /Train/checkpoint/config/meta_lat.json
using bucket info in metadata / メタデータ内のbucket情報を使います
[Dataset 0]
  batch_size: 1
  resolution: (512, 768)
  enable_bucket: True
  min_bucket_reso: None
  max_bucket_reso: None
  bucket_reso_steps: None
  bucket_no_upscale: None

  [Subset 0 of Dataset 0]
    image_dir: "C:/Users/user/stable-diffusion /Train/checkpoint/img/1_name"
    image_count: 16680
    num_repeats: 1
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    metadata_file: C:/Users/user/stable-diffusion /Train/checkpoint/config/meta_lat.json

[Dataset 0]
loading image sizes.
100%|███████████████████████████████████████████████████████████████████████| 16680/16680 [00:00<00:00, 4156674.63it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (256, 1024), count: 4
bucket 1: resolution (320, 1024), count: 43
bucket 2: resolution (384, 896), count: 205
bucket 3: resolution (384, 960), count: 56
bucket 4: resolution (384, 1024), count: 31
bucket 5: resolution (448, 832), count: 1649
bucket 6: resolution (512, 704), count: 5403
bucket 7: resolution (512, 768), count: 3744
bucket 8: resolution (576, 576), count: 1235
bucket 9: resolution (576, 640), count: 1628
bucket 10: resolution (640, 576), count: 684
bucket 11: resolution (704, 512), count: 844
bucket 12: resolution (768, 512), count: 391
bucket 13: resolution (832, 448), count: 760
bucket 14: resolution (896, 384), count: 3
mean ar error (without repeats): 0.0
prepare accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/user/stable-diffusion-webui/models/Stable-diffusion/#New folder/checkpoint-000018.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Disable Diffusers' xformers
Enable xformers for U-Net
[Dataset 0]
caching latents.
checking cache validity...
100%|███████████████████████████████████████████████████████████████████████| 16680/16680 [00:00<00:00, 2388480.79it/s]
caching latents...
0it [00:00, ?it/s]
enable text encoder training
prepare optimizer, data loader etc.
Traceback (most recent call last):
  File "C:\Users\user\Kohya 2\library\train_util.py", line 3433, in get_optimizer
    import bitsandbytes as bnb
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
    from .cuda_setup.main import evaluate_cuda_setup
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
    from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\user\Kohya 2\fine_tune.py", line 499, in <module>
    train(args)
  File "C:\Users\user\Kohya 2\fine_tune.py", line 212, in train
    _, _, optimizer = train_util.get_optimizer(args, trainable_params=trainable_params)
  File "C:\Users\user\Kohya 2\library\train_util.py", line 3435, in get_optimizer
    raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\Kohya 2\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "C:\Users\user\Kohya 2\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\user\\Kohya 2\\venv\\Scripts\\python.exe', './fine_tune.py', '--train_text_encoder', '--learning_rate_te=1e-05', '--pretrained_model_name_or_path=C:/Users/user/stable-diffusion-webui/models/Stable-diffusion/#New folder/checkpoint-000018.safetensors', '--in_json=C:/Users/user/stable-diffusion /Train/checkpoint/config/meta_lat.json', '--train_data_dir=C:/Users/user/stable-diffusion /Train/checkpoint/img/1_name', '--output_dir=C:/Users/user/stable-diffusion /Train/checkpoint/model', '--logging_dir=C:/Users/user/stable-diffusion /Train/checkpoint/log', '--dataset_repeats=1', '--enable_bucket', '--resolution=512,768', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--save_model_as=safetensors', '--output_name=checkpoint', '--max_token_length=225', '--learning_rate=1e-05', '--lr_scheduler=cosine', '--train_batch_size=1', '--max_train_steps=333600', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Lion8bit', '--max_data_loader_n_workers=0', '--max_token_length=225', '--bucket_reso_steps=8', '--min_timestep=500', '--max_timestep=650', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.
TheAnay commented 11 months ago

i too have the same issue

02:33:37-199614 INFO Start training LoRA Standard ... 02:33:37-201621 INFO Checking for duplicate image filenames in training data directory... 02:33:37-204632 INFO Valid image folder names found in: D:/ai/train/images 02:33:37-205634 INFO Folder 15_train: 34 images found 02:33:37-207641 INFO Folder 15_train: 510 steps 02:33:37-208644 INFO Total steps: 510 02:33:37-210651 INFO Train batch size: 1 02:33:37-211660 INFO Gradient accumulation steps: 1 02:33:37-213662 INFO Epoch: 10 02:33:37-215667 INFO Regulatization factor: 1 02:33:37-216681 INFO max_train_steps (510 / 1 / 1 * 10 * 1) = 5100 02:33:37-218677 INFO stop_text_encoder_training = 0 02:33:37-219680 INFO lr_warmup_steps = 510 02:33:37-220684 INFO Saving training config to D:/ai/train/model\trained_20231122-023337.json... 02:33:37-223695 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --pretrained_model_name_or_path="C:/Users/TheAnay/Downloads/v1-5-pruned.safetensors" --train_data_dir="D:/ai/train/images" --resolution="768,768" --output_dir="D:/ai/train/model" --logging_dir="D:/ai/train/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="trained" --lr_scheduler_num_cycles="10" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="510" --train_batch_size="1" --max_train_steps="5100" --save_every_n_epochs="2" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 The following values were not passed toaccelerate launchand had defaults used instead: --num_processeswas set to a value of1 --num_machineswas set to a value of1 --mixed_precisionwas set to a value of'no' --dynamo_backendwas set to a value of'no' To avoid this warning pass in values for each of the problematic parameters or runaccelerate config`. prepare tokenizer Using DreamBooth method. prepare images. found directory D:\ai\train\images\15_train contains 34 image files No caption file found for 34 images. Training will continue without captions for these images. If class token exists, it will be used. / 34枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 D:\ai\train\images\15_train\train (1).jpg D:\ai\train\images\15_train\train (10).jpg D:\ai\train\images\15_train\train (11).jpg D:\ai\train\images\15_train\train (12).jpg D:\ai\train\images\15_train\train (13).jpg D:\ai\train\images\15_train\train (14).jpg... and 29 more 510 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (768, 768) enable_bucket: False

[Subset 0 of Dataset 0] image_dir: "D:\ai\train\images\15_train" image_count: 34 num_repeats: 15 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: train caption_extension: .caption

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 34/34 [00:00<00:00, 2823.89it/s] prepare dataset preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: C:/Users/TheAnay/Downloads/v1-5-pruned.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|██████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:00<?, ?it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████████| 34/34 [00:14<00:00, 2.34it/s] create LoRA network. base dim (rank): 8, alpha: 1.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "D:\ai\kohya_ss\library\train_util.py", line 3433, in get_optimizer import bitsandbytes as bnb File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes__init.py", line 6, in from . import cuda_setup, utils, research File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes\research__init__.py", line 1, in from . import nn File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\init__.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "D:\ai\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\ai\kohya_ss\train_network.py", line 1012, in trainer.train(args) File "D:\ai\kohya_ss\train_network.py", line 342, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "D:\ai\kohya_ss\library\train_util.py", line 3435, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\TheAnay\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\TheAnay\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\ai\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\ai\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\ai\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "D:\ai\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\ai\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--pretrained_model_name_or_path=C:/Users/TheAnay/Downloads/v1-5-pruned.safetensors', '--train_data_dir=D:/ai/train/images', '--resolution=768,768', '--output_dir=D:/ai/train/model', '--logging_dir=D:/ai/train/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=trained', '--lr_scheduler_num_cycles=10', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=510', '--train_batch_size=1', '--max_train_steps=5100', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.`

Please help if u find a solution

Aamir3d commented 10 months ago

Try installing https://github.com/jllllll/bitsandbytes-windows-webui

TeKett commented 10 months ago

Try installing https://github.com/jllllll/bitsandbytes-windows-webui

nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?

Aamir3d commented 10 months ago

Try installing https://github.com/jllllll/bitsandbytes-windows-webui

nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?

Not 100% sure this is the solution, but there are two versions of bitsandbytes floating around. One is the Tim Dettmers one, which doesn't work with a lot of installations. The Jllllll one is a newer build and has solved bitsandbytes problems for many people (in other applications).

TeKett commented 10 months ago

Try installing https://github.com/jllllll/bitsandbytes-windows-webui

nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?

Not 100% sure this is the solution, but there are two versions of bitsandbytes floating around. One is the Tim Dettmers one, which doesn't work with a lot of installations. The Jllllll one is a newer build and has solved bitsandbytes problems for many people (in other applications).

Kohya is already installing the jllllll versions for a few months now. Saw same issue posted on reddit, they installed kohya for the first time so it seems to be an issue with v22.x.x when installed fresh rather then upgrading to the same instance, iv not tried to experiment since my 21.8.8 is working just fine and i don't have allot of free time.

Aamir3d commented 10 months ago

Try installing https://github.com/jllllll/bitsandbytes-windows-webui

nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?

Not 100% sure this is the solution, but there are two versions of bitsandbytes floating around. One is the Tim Dettmers one, which doesn't work with a lot of installations. The Jllllll one is a newer build and has solved bitsandbytes problems for many people (in other applications).

Kohya is already installing the jllllll versions for a few months now. Saw same issue posted on reddit, they installed kohya for the first time so it seems to be an issue with v22.x.x when installed fresh rather then upgrading to the same instance, iv not tried to experiment since my 21.8.8 is working just fine and i don't have allot of free time.

Thank you - I didn't realize that. I think that some application or the other might interfere with an install (this happens if we've got a number of AI applications being tested or running). If I get time today, I'll try and check out the latest Kohya and see if I can replicate this.

TheAnay commented 10 months ago

@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui

one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes

is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.

Aamir3d commented 10 months ago

@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui

one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes

is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.

@TeKett @TheAnay Someone commented elsewhere that running Setup can resolve the bitsandbytes issues.

@TheAnay - to install bitsandbytes, you'll first need to activate the venv.

@TeKett - as an update, I tried running Kohya with the latest updates today and it worked flawlessly for me.

TheAnay commented 10 months ago

@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.

@TeKett @TheAnay Someone commented elsewhere that running Setup can resolve the bitsandbytes issues.

@TheAnay - to install bitsandbytes, you'll first need to activate the venv.

  • Go to your Kohya_ss folder
  • Go to the venv folder
  • Run activate.bat to activate the venv
  • pip uninstall bitsandbytes
  • Install bitsandbytes from the jllllll repo
  • Run kohyass using the gui-user.bat file
  • (you might want to run the setup.bat first to confirm all requirements are OK)

@TeKett - as an update, I tried running Kohya with the latest updates today and it worked flawlessly for me.

Thanks a lot bro @Aamir3d that worked

Wreggor commented 10 months ago

@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.

@TeKett @TheAnay Someone commented elsewhere that running Setup can resolve the bitsandbytes issues.

@TheAnay - to install bitsandbytes, you'll first need to activate the venv.

  • Go to your Kohya_ss folder
  • Go to the venv folder
  • Run activate.bat to activate the venv
  • pip uninstall bitsandbytes
  • Install bitsandbytes from the jllllll repo
  • Run kohyass using the gui-user.bat file
  • (you might want to run the setup.bat first to confirm all requirements are OK)

@TeKett - as an update, I tried running Kohya with the latest updates today and it worked flawlessly for me.

Just made an account to express how grateful I am to you, 4 days of mess, more than 10 reinstallations, thank you very much.

TeKett commented 10 months ago

HOLY SHIT I FOUND THE ISSUE AND ITS STUPID AS ALL HELL.

bitsandbytes-windows-webui is missing the module "paths", that exist in other version of bitsandbytes. Kohya before v22.x.x used to install bitsandbytes 0.35 before overwriting it by installing bitsandbytes-windows-webui. This ment that the paths module, and likely other files, remained. Now tho in v22 Kohya no longer installs bitsandbytes 0.35, so the "paths" module dont exist where it should be, and throws the error.

Turns out im the stupid one, me dumdum, shame on me, think before speak. Apparently it goes deeper.

Versions before v22 of Kohya installed bitsandbytes 0.35, which don't have this issue. Now since v22 of kohya, it installs version 0.41.1 of bitsandbytes causing the issue that the "paths" file is missing.

Simply copying over the paths file from an older version does not fix it and throws another error.

On a side note, Kohya dont install bitsandbytes-windows-webui by default, but does when you do the "install specific bitsandbytes version". Still the same issue tho on all versions that isn't 0.35.

Dug some more and found that both the bitsandbytes-windows-webui and bitsandbytes main.py looks completely different from the main i get when i install the different bitsandbytes versions via Kohya's installer or pip. The rest of the files seems fine, 0.35 only has cuda118 and 0.41.1 has cuda122, so it is getting some if not all other files correct.

I installed manually instead by extracting the wheel into the venv, now it works. So the question is, why are a handfull of people, both when upgrading or installing for the first time ever, not able to install bitsandbytes correctly using pip?