Got a RTX 4070 card, but can't get Lora to run.

Deejay85 commented 1 year ago

To make things easier, I've included a screenshot of my settings. I've tried different settings, but it never does run the way I want it too.

Additionally here is the program details:

System Information:
System: Windows, Release: 10, Version: 10.0.19045, Machine: AMD64, Processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel

Python Information:
Version: 3.10.6, Implementation: CPython, Compiler: MSC v.1932 64 bit (AMD64)

Virtual Environment Information:
Path: S:\kohya_ss\venv

GPU Information:
Name: NVIDIA GeForce RTX 4070, VRAM: 12282 MiB

Validating that requirements are satisfied.
All requirements satisfied.
headless: False
Load CSS...
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
Loading config...
Loading config...
Folder 100_hugeballs: 26 images found
Folder 100_hugeballs: 2600 steps
max_train_steps = 1300
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --v_parameterization --pretrained_model_name_or_path="S:/WaifuDiffusion/models/Stable-diffusion/HD-22-fp32.safetensors" --train_data_dir="S:\kohya_ss\SampleImages\Image" --resolution=768,768 --output_dir="S:\kohya_ss\SampleImages\Model" --logging_dir="S:\kohya_ss\SampleImages\Log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --network_dim=8 --output_name="hugeballs" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1300" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory S:\kohya_ss\SampleImages\Image\100_hugeballs contains 26 image files
2600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (768, 768)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\SampleImages\Image\100_hugeballs"
    image_count: 26
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: hugeballs
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 5199.63it/s]
prepare dataset
prepare accelerator
S:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint
S:\kohya_ss\venv\lib\site-packages\safetensors\torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
S:\kohya_ss\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
S:\kohya_ss\venv\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
Traceback (most recent call last):
  File "S:\kohya_ss\train_network.py", line 773, in <module>
    train(args)
  File "S:\kohya_ss\train_network.py", line 146, in train
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
  File "S:\kohya_ss\library\train_util.py", line 2928, in load_target_model
    text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
  File "S:\kohya_ss\library\train_util.py", line 2894, in _load_target_model
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path, device)
  File "S:\kohya_ss\library\model_util.py", line 863, in load_models_from_stable_diffusion_checkpoint
    info = unet.load_state_dict(converted_unet_checkpoint)
  File "S:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
        size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]).
        size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]).
        size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
        size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]).
Traceback (most recent call last):
  File "C:\Users\<REDACTED>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\<REDACTED>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "S:\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 923, in launch_command
    simple_launcher(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 579, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--v2', '--v_parameterization', '--pretrained_model_name_or_path=S:/WaifuDiffusion/models/Stable-diffusion/HD-22-fp32.safetensors', '--train_data_dir=S:\\kohya_ss\\SampleImages\\Image', '--resolution=768,768', '--output_dir=S:\\kohya_ss\\SampleImages\\Model', '--logging_dir=S:\\kohya_ss\\SampleImages\\Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=8', '--output_name=hugeballs', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1300', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

I got the drivers for the card installed, as well as the CUDA driver 12.1.0_531.14, and Visual Basic 2022 for Nvision. Also, yes, 3.10.6 is the python version that I am using.

bmaltais commented 1 year ago

Try training with the SD1.5 checkpoint instead of the custom HD-22-fp32.safetensors checkpoint... if that work then the issue is with the base model you are trying to use for training...

Deejay85 commented 1 year ago

I tried that, and apparently it didn't like I thought it would, meaning whatever is wrong is most likely either with the program, or some setting I'm not getting right. Since disabling xformers fixed it last time, I tried it again, and no dice. Here is the resulting log when using SD 1.5


System Information:
System: Windows, Release: 10, Version: 10.0.19045, Machine: AMD64, Processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel

Python Information:
Version: 3.10.6, Implementation: CPython, Compiler: MSC v.1932 64 bit (AMD64)

Virtual Environment Information:
Path: S:\kohya_ss\venv

GPU Information:
Name: NVIDIA GeForce RTX 4070, VRAM: 12282 MiB

Validating that requirements are satisfied.
All requirements satisfied.
headless: False
Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading config...
Folder 100_hugeballs: 26 images found
Folder 100_hugeballs: 2600 steps
max_train_steps = 1300
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="S:\kohya_ss\SampleImages\Image" --resolution=768,768 --output_dir="S:\kohya_ss\SampleImages\Model" --logging_dir="S:\kohya_ss\SampleImages\Log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --network_dim=8 --output_name="hugeballs" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1300" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory S:\kohya_ss\SampleImages\Image\100_hugeballs contains 26 image files
2600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (768, 768)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\SampleImages\Image\100_hugeballs"
    image_count: 26
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: hugeballs
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 5777.28it/s]
prepare dataset
prepare accelerator
S:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models
text_encoder\model.safetensors not found
Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
S:\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
S:\kohya_ss\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
S:\kohya_ss\venv\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
S:\kohya_ss\venv\lib\site-packages\safetensors\torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Traceback (most recent call last):
  File "S:\kohya_ss\train_network.py", line 773, in <module>
    train(args)
  File "S:\kohya_ss\train_network.py", line 146, in train
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
  File "S:\kohya_ss\library\train_util.py", line 2928, in load_target_model
    text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
  File "S:\kohya_ss\library\train_util.py", line 2899, in _load_target_model
    pipe = StableDiffusionPipeline.from_pretrained(name_or_path, tokenizer=None, safety_checker=None)
  File "S:\kohya_ss\venv\lib\site-packages\diffusers\pipeline_utils.py", line 709, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 2301, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "S:\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 413, in load_state_dict
    return safe_load_file(checkpoint_file)
  File "S:\kohya_ss\venv\lib\site-packages\safetensors\torch.py", line 100, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: self.size(-1) must be divisible by 4 to view Byte as Float (different element sizes), but got 129203566
Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "S:\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 923, in launch_command
    simple_launcher(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 579, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=S:\\kohya_ss\\SampleImages\\Image', '--resolution=768,768', '--output_dir=S:\\kohya_ss\\SampleImages\\Model', '--logging_dir=S:\\kohya_ss\\SampleImages\\Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=8', '--output_name=hugeballs', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1300', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--bucket_no_upscale']' returned non-zero exit status 1.

```System Information:
System: Windows, Release: 10, Version: 10.0.19045, Machine: AMD64, Processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel

Python Information:
Version: 3.10.6, Implementation: CPython, Compiler: MSC v.1932 64 bit (AMD64)

Virtual Environment Information:
Path: S:\kohya_ss\venv

GPU Information:
Name: NVIDIA GeForce RTX 4070, VRAM: 12282 MiB

Validating that requirements are satisfied.
All requirements satisfied.
headless: False
Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading config...
Folder 100_hugeballs: 26 images found
Folder 100_hugeballs: 2600 steps
max_train_steps = 1300
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="S:\kohya_ss\SampleImages\Image" --resolution=768,768 --output_dir="S:\kohya_ss\SampleImages\Model" --logging_dir="S:\kohya_ss\SampleImages\Log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --network_dim=8 --output_name="hugeballs" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1300" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory S:\kohya_ss\SampleImages\Image\100_hugeballs contains 26 image files
2600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (768, 768)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\SampleImages\Image\100_hugeballs"
    image_count: 26
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: hugeballs
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 5777.28it/s]
prepare dataset
prepare accelerator
S:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models
text_encoder\model.safetensors not found
Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
S:\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
S:\kohya_ss\venv\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
S:\kohya_ss\venv\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
S:\kohya_ss\venv\lib\site-packages\safetensors\torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Traceback (most recent call last):
  File "S:\kohya_ss\train_network.py", line 773, in <module>
    train(args)
  File "S:\kohya_ss\train_network.py", line 146, in train
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
  File "S:\kohya_ss\library\train_util.py", line 2928, in load_target_model
    text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
  File "S:\kohya_ss\library\train_util.py", line 2899, in _load_target_model
    pipe = StableDiffusionPipeline.from_pretrained(name_or_path, tokenizer=None, safety_checker=None)
  File "S:\kohya_ss\venv\lib\site-packages\diffusers\pipeline_utils.py", line 709, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 2301, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "S:\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 413, in load_state_dict
    return safe_load_file(checkpoint_file)
  File "S:\kohya_ss\venv\lib\site-packages\safetensors\torch.py", line 100, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: self.size(-1) must be divisible by 4 to view Byte as Float (different element sizes), but got 129203566
Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "S:\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 923, in launch_command
    simple_launcher(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 579, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=S:\\kohya_ss\\SampleImages\\Image', '--resolution=768,768', '--output_dir=S:\\kohya_ss\\SampleImages\\Model', '--logging_dir=S:\\kohya_ss\\SampleImages\\Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=8', '--output_name=hugeballs', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1300', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--bucket_no_upscale']' returned non-zero exit status 1.

bmaltais commented 1 year ago

You may want to go back a few releases and see if it help. Best is to delete the kohya_ss folder, git clone the project repo then

git checkout release nameyouwant

Then run setup.bat again

Deejay85 commented 1 year ago

Surprisingly after a power outage, I rebooted the computer, tried it again, and it worked this time. Of course maybe deleting and reinstalling SD 1.5 might have done something with it...I don't know. Also, where I'm doing training with an anime feel, would SD 1.5 be a good idea to test things out on? Hentai Diffusion v22 doesn't seem to want to work at all, and the reason I'm using that is because WaifuDiffusion isn't that great of a finished product for some reason. I do have some Lora files installed, but none of them are activated, or being referenced so maybe it's just the checkpoint? All I know is that putting the setting at DPM++SDE is the best I can do for now. /sigh

I downloaded SD 2.1 as a test, and it ran just fine. If it works...why won't Hentai Diffusion? I'm sure that was trained on 2.1, and I think WD was trained on at least 2.0 (could be wrong).

bmaltais commented 1 year ago

When you train a LoRA simply use the base model the model you intend to apply it to is from... So if the Waifu model is from SD1.5 then just train against that base model... then apply the LoRA to the Waifu model. All will be fine.

Deejay85 commented 1 year ago

Surprisingly enough, once I installed SD 2.1 via the application, everything started to write that seem to me that, in order for training to work, it needs some prerequisite files to be installed, hence why installing version 2.1 may have been what fixed it. Sounds annoying, but it makes sense, in one of those it all becomes easy when you understand what you're doing formats.

Speaking of which, I did manage to train something, but I do have a very important question to ask…when I am tagging everything, I am aware that they are times I should, and times I should not have to tag literally everything. For example, with a drawing style, you tag everything in the picture, but that leaves the question of when are you NOT supposed to tag something? Since I am trying to tag a very specific fetish, obviously this is kind of important to me right now, as I think I'm going to have to redo the first two epochs of what I previously trained.

bmaltais / kohya_ss

Got a RTX 4070 card, but can't get Lora to run. #773