bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

I can't start a training #815

Closed XSilverHostX closed 8 months ago

XSilverHostX commented 1 year ago

I've been having this problem for days, can anyone help me please?

System Information: System: Windows, Release: 10, Version: 10.0.22621, Machine: AMD64, Processor: Intel64 Family 6 Model 165 Stepping 3, GenuineIntel

Python Information: Version: 3.10.6, Implementation: CPython, Compiler: MSC v.1932 64 bit (AMD64)

Virtual Environment Information: Path: C:\Users\lucas\Desktop\Kohya\kohya_ss\venv

GPU Information: Name: NVIDIA GeForce RTX 3060 Ti, VRAM: 8192 MiB

Validating that requirements are satisfied. All requirements satisfied. headless: False Load CSS... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Loading config... Folder 100_test: 11 images found Folder 100_test: 1100 steps max_train_steps = 550 stop_text_encoder_training = 0 lr_warmup_steps = 55 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v_parameterization --enable_bucket --pretrained_model_name_or_path="C:/Users/lucas/Desktop/Stable/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV20_v20.safetensors" --train_data_dir="C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Image" --resolution=512,512 --output_dir="C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Model" --logging_dir="C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="55" --train_batch_size="2" --max_train_steps="550" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --mem_eff_attn --gradient_checkpointing --xformers --bucket_no_upscale v_parameterization should be with v2 / v1でv_parameterizationを使用することは想定されていません prepare tokenizer Use DreamBooth method. prepare images. found directory C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Image\100_test contains 11 image files 1100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Image\100_test" image_count: 11 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: test caption_extension: .caption

[Dataset 0] loading image sizes. 100%|█████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 572.04it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (64, 128), count: 200 bucket 1: resolution (192, 128), count: 100 bucket 2: resolution (192, 384), count: 100 bucket 3: resolution (256, 256), count: 100 bucket 4: resolution (256, 384), count: 100 bucket 5: resolution (256, 448), count: 100 bucket 6: resolution (256, 512), count: 100 bucket 7: resolution (320, 384), count: 100 bucket 8: resolution (320, 512), count: 100 bucket 9: resolution (512, 512), count: 100 mean ar error (without repeats): 0.10139020800974884 prepare accelerator C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir instead. warnings.warn( Using accelerator 0.15.0 or above. loading model for process 0/1 load StableDiffusion checkpoint: C:/Users/lucas/Desktop/Stable/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV20_v20.safetensors loading u-net: loading vae: loading text encoder: Replace CrossAttention.forward to use FlashAttention (not xformers) [Dataset 0] caching latents. 0%| | 0/11 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Users\lucas\Desktop\Kohya\kohya_ss\train_network.py", line 783, in train(args) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\train_network.py", line 157, in train train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\library\train_util.py", line 1399, in cache_latents dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\library\train_util.py", line 812, in cache_latents latents = vae.encode(img_tensors).latent_dist.sample().to("cpu") File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\vae.py", line 566, in encode h = self.encoder(x) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\vae.py", line 130, in forward sample = self.conv_in(sample) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: "slow_conv2d_cpu" not implemented for 'Half' Traceback (most recent call last): File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 923, in launch_command simple_launcher(args) File "C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 579, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\lucas\Desktop\Kohya\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--v_parameterization', '--enable_bucket', '--pretrained_model_name_or_path=C:/Users/lucas/Desktop/Stable/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV20_v20.safetensors', '--train_data_dir=C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Image', '--resolution=512,512', '--output_dir=C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Model', '--logging_dir=C:\Users\lucas\OneDrive\Documents\Lora Training Data\Test\Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=55', '--train_batch_size=2', '--max_train_steps=550', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

rushuna86 commented 1 year ago

did you happen to tick the options v2 v_parameterization i'm asking because of line in your console, "v_parameterization should be with v2 / v1でv_parameterizationを使用することは想定されていません"

realisticvision is SD 1.5, you don't use those 2 param with 1.5, they're for SD2.0 and 768,768

XSilverHostX commented 1 year ago

Thanks for the reply, I unchecked the parameters but I keep getting the same errors