bmaltais / kohya_ss

Apache License 2.0
9.59k stars 1.24k forks source link

Textual Inversion Broken | SDXL not training at all, SD1.5 other error #2545

Closed rafstahelin closed 5 months ago

rafstahelin commented 5 months ago

Is there any indication that this module has been updated and is in working order? I have tried to use auto1111 and Forge, neither codes for embedding training work. Now here in Kohya, embedding training does not launch, with following error:

With SDXL I get this error very early on:

Traceback (most recent call last):
  File "E:\kohya_ss\sd-scripts\sdxl_train_textual_inversion.py", line 138, in <module>
    trainer.train(args)
  File "E:\kohya_ss\sd-scripts\train_textual_inversion.py", line 197, in train
    model_version, text_encoder_or_list, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
  File "E:\kohya_ss\sd-scripts\sdxl_train_textual_inversion.py", line 36, in load_target_model
    ) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype)
  File "E:\kohya_ss\sd-scripts\library\sdxl_train_util.py", line 50, in load_target_model
    args.disable_mmap_load_safetensors,
AttributeError: 'Namespace' object has no attribute 'disable_mmap_load_safetensors'
Traceback (most recent call last):
  File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "E:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
  File "E:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "E:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "E:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['E:\\kohya_ss\\venv\\Scripts\\python.exe', 'E:/kohya_ss/sd-scripts/sdxl_train_textual_inversion.py', '--config_file', 'E:\\studio Dropbox\\studio\\ai\\data\\styles\\style_baroque\\3training\\training_b4r0que\\models/config_textual_inversion-20240527-124712.toml']' returned non-zero exit status 1.
12:47:27-550978 INFO     Training has ended.

With base SD 1.5 I seem to get further on but get: No data found. Please verify arguments / 画像がありません。引数指定を確認してください

13:01:32-926407 INFO     end of toml config file: E:\studio
                         Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\models/config_textual_in
                         version-20240527-130132.toml
13:01:33-506794 INFO     Start training TI...
13:01:33-508299 INFO     Validating lr scheduler arguments...
13:01:33-509305 INFO     Validating optimizer arguments...
13:01:33-509305 INFO     Validating E:\studio Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\log
                         existence and writability... SUCCESS
13:01:33-510305 INFO     Validating E:\studio
                         Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\models existence and
                         writability... SUCCESS
13:01:33-511304 INFO     Validating E:\studio Dropbox\studio\ai\libs\SD\1models\1.5\v1-5-pruned-emaonly.safetensors
                         existence... SUCCESS
13:01:33-512304 INFO     Validating E:\studio Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\10
                         existence... SUCCESS
13:01:33-514303 INFO     Regulatization factor: 1
13:01:33-514303 INFO     Total steps: 0
13:01:33-515304 INFO     Train batch size: 1
13:01:33-516304 INFO     Gradient accumulation steps: 1
13:01:33-516304 INFO     Epoch: 1
13:01:33-517808 INFO     Max train steps: 1000
13:01:33-517808 INFO     stop_text_encoder_training = 0
13:01:33-518815 INFO     lr_warmup_steps = 100
13:01:33-520813 INFO     Saving training config to E:\studio
                         Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\models\baroque-test_2024
                         0527-130133.json...
13:01:33-521814 INFO     Executing command: E:\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no
                         --dynamo_mode default --mixed_precision fp16 --num_processes 1 --num_machines 1
                         --num_cpu_threads_per_process 2 E:/kohya_ss/sd-scripts/train_textual_inversion.py --config_file
                         E:\studio
                         Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\models/config_textual_in
                         version-20240527-130133.toml
13:01:33-526817 INFO     Command executed.
2024-05-27 13:01:42 INFO     Loading settings from E:\studio                                          train_util.py:3791
                             Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\m
                             odels/config_textual_inversion-20240527-130133.toml...
                    INFO     E:\studio                                                                train_util.py:3810
                             Dropbox\studio\ai\data\styles\style_baroque\3training\training_b4r0que\m
                             odels/config_textual_inversion-20240527-130133
2024-05-27 13:01:42 INFO     prepare tokenizer                                                        train_util.py:4282
                    INFO     update token length: 75                                                  train_util.py:4299
                    INFO     prepare accelerator                                          train_textual_inversion.py:189
accelerator device: cuda
                    INFO     loading model for process 0/1                                            train_util.py:4440
                    INFO     load StableDiffusion checkpoint: E:\studio                               train_util.py:4396
                             Dropbox\studio\ai\libs\SD\1models\1.5\v1-5-pruned-emaonly.safetensors
                    INFO     UNet2DConditionModel: 64, 8, 768, False, False                        original_unet.py:1387
2024-05-27 13:01:47 INFO     loading u-net: <All keys matched successfully>                           model_util.py:1009
2024-05-27 13:01:48 INFO     loading vae: <All keys matched successfully>                             model_util.py:1017
2024-05-27 13:01:49 INFO     loading text encoder: <All keys matched successfully>                    model_util.py:1074
token length for init words is not same to num_vectors_per_token, init words is repeated or truncated / 初期化単語のトークン長がnum_vectors_per_tokenと合わないため、繰り返しまたは切り捨てが発生します:  tokenizer 1, length 5
tokens are added for tokenizer 1: [49408, 49409]
create embeddings for 2 tokens, for b4r0que
Use DreamBooth method.
2024-05-27 13:01:50 INFO     prepare images.                                                          train_util.py:1572
                    INFO     0 train images with repeating.                                           train_util.py:1613
                    INFO     0 reg images.                                                            train_util.py:1616
                    WARNING  no regularization images / 正則化画像が見つかりませんでした              train_util.py:1621
                    INFO     [Dataset 0]                                                              config_util.py:567
                               batch_size: 1
                               resolution: (1024, 1280)
                               enable_bucket: False
                               network_multiplier: 1.0

                    INFO     [Dataset 0]                                                              config_util.py:573
                    INFO     loading image sizes.                                                      train_util.py:853
0it [00:00, ?it/s]
                    INFO     make buckets                                                              train_util.py:859
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is   train_util.py:876
                             set, because bucket reso is defined by image size automatically /
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
                             算されるため、min_bucket_resoとmax_bucket_resoは無視されます
                    INFO     number of images (including repeats) /                                    train_util.py:905
                             各bucketの画像枚数(繰り返し回数を含む)
E:\kohya_ss\venv\lib\site-packages\numpy\core\fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
E:\kohya_ss\venv\lib\site-packages\numpy\core\_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
                    INFO     mean ar error (without repeats): nan                                      train_util.py:915
No data found. Please verify arguments / 画像がありません。引数指定を確認してください
13:01:52-817354 INFO     Training has ended.

Ideally, I'd like to train in SDXL

baroque-test_20240527-130133.JSON

bmaltais commented 5 months ago

I just tested training a TI with sd1.5 using my test json file ./test/config/TI-AdamW8bit.json and it ran. Can you test it? It should also work for you.

The namespace error appear to be something wrong with the latest training script of @kohya-ss part of his sd-scripts repo code base. I can't fix that one.

b-fission commented 5 months ago

@rafstahelin

AttributeError: 'Namespace' object has no attribute 'disable_mmap_load_safetensors'

I'm only getting this error on the dev branch. Is that the version you're running? Switching to the current master or v24.1.4 seems to start TI training at least.

......

0 train images with repeating.
...
No data found. Please verify arguments / 画像がありません。引数指定を確認してください

That means your image folders aren't set up correctly. Do your subfolders follow the naming scheme of repeats_word class where "repeats" is a number?

And the path you're using for the image folder, named 10, seems off. E:\\studio Dropbox\\studio\\ai\\data\\styles\\style_baroque\\3training\\training_b4r0que\\10

bmaltais commented 5 months ago

The dev branch is using the new sd-scripts dev branch version of sd-scripts... so it must have been introduced as part of the new sd-scripts code that got added to it.

rafstahelin commented 5 months ago

@bmaltais @b-fission OK, working!

I was missing the folder structure for the dataset with repeats_word class. Thanks for that. I thought that was only for dreambooth.

This is one of my captions. Given that my class is architecture and my dataset folder runs 5_b4r0que architecture, is this caption correct, with the trigger. Here's an image:

b4r0que, a photo of architectural interior, grand hallway, classical architecture, intricate floor mosaic, arched doorways, domed ceilings, marble walls, ornate columns, recessed wall panels, decorative friezes, subdued natural lighting, long perspective view, symmetrical composition, historic building, high detail finish, elegance, opulence

baroque-000

Just not sure the class is being mentioned twice is a good thing. Or should I drop the trigger in the caption

Thanks

rafstahelin commented 5 months ago

@bmaltais @b-fission And yes it is now working on master branch. Any chance you will merge dev as I was hoping to continue using the wandb log you recently added to dev

b-fission commented 5 months ago

Regarding this error for TI training in the dev branch... AttributeError: 'Namespace' object has no attribute 'disable_mmap_load_safetensors'

Uncommenting this line should fix it: https://github.com/kohya-ss/sd-scripts/blob/bfb352bc433326a77aca3124248331eb60c49e8c/sdxl_train_textual_inversion.py#L126

rafstahelin commented 5 months ago

@b-fission will go this way until dev gets merged. thanks!

blakdeth19 commented 2 months ago

File "E:\kohya_ss\sd-scripts\library\sdxl_train_util.py", line 50, in load_target_model args.disable_mmap_load_safetensors, AttributeError: 'Namespace' object has no attribute 'disable_mmap_load_safetensors'

Deleting this line from sdxl_train_util.py allows the TI training to work.

Regarding this error for TI training in the dev branch... AttributeError: 'Namespace' object has no attribute 'disable_mmap_load_safetensors'

Uncommenting this line should fix it: https://github.com/kohya-ss/sd-scripts/blob/bfb352bc433326a77aca3124248331eb60c49e8c/sdxl_train_textual_inversion.py#L126

This does nothing, as the line suggested is a comment line and is not read as script

bmaltais commented 2 months ago

You should report this to Konya in his as-scripts repo so he can fix it in his case.

chromesun commented 2 months ago

Thanks @blakdeth19 - commenting out line 50 in sdxl_train_util.py worked for me in kohya_ss v24.1.6. TI training now working for me :-)

UrzasLegacy commented 2 months ago

Uncommenting this line should fix it: https://github.com/kohya-ss/sd-scripts/blob/bfb352bc433326a77aca3124248331eb60c49e8c/sdxl_train_textual_inversion.py#L126

This does nothing, as the line suggested is a comment line and is not read as script

I was having same issue and UNcommenting this line so that it IS read fixed it for me.