bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

Error when trying to train in the new version #634

Closed sashaok123 closed 8 months ago

sashaok123 commented 1 year ago

To create a public link, set share=True in launch(). Loading config... Folder 50_BetterCallSaul: 54 images found Folder 50_BetterCallSaul: 2700 steps max_train_steps = 2700 stop_text_encoder_training = 0 lr_warmup_steps = 0 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --v_parameterization --enable_bucket --pretrained_model_name_or_path="D:/AI/stable-diffusion-webui/models/Stable-diffusion/2.1_SD2.1_768.safetensors" --train_data_dir="D:/AI/LoRA/works/BetterCallSaul/img" --resolution=768,768 --output_dir="D:/AI/LoRA/works/BetterCallSaul/model" --logging_dir="D:/AI/LoRA/works/BetterCallSaul/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="21BCS" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="linear" --train_batch_size="1" --max_train_steps="2700" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません prepare tokenizer Use DreamBooth method. prepare images. found directory D:\AI\LoRA\works\BetterCallSaul\img\50_BetterCallSaul contains 54 image files 2700 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (768, 768) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "D:\AI\LoRA\works\BetterCallSaul\img\50_BetterCallSaul" image_count: 54 num_repeats: 50 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: BetterCallSaul caption_extension: .txt

[Dataset 0] loading image sizes. 0%| | 0/54 [00:00<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ D:\AI\kohya_ss\train_network.py:773 in │ │ │ │ 770 │ args = parser.parse_args() │ │ 771 │ args = train_util.read_config_from_file(args, parser) │ │ 772 │ │ │ ❱ 773 │ train(args) │ │ 774 │ │ │ │ D:\AI\kohya_ss\train_network.py:117 in train │ │ │ │ 114 │ │ │ } │ │ 115 │ │ │ 116 │ blueprint = blueprint_generator.generate(user_config, args, tokenizer=tokenizer) │ │ ❱ 117 │ train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.data │ │ 118 │ │ │ 119 │ current_epoch = Value("i", 0) │ │ 120 │ current_step = Value("i", 0) │ │ │ │ D:\AI\kohya_ss\library\config_util.py:436 in generate_dataset_group_by_blueprint │ │ │ │ 433 seed = random.randint(0, 2**31) # actual seed is seed + epoch_no │ │ 434 for i, dataset in enumerate(datasets): │ │ 435 │ print(f"[Dataset {i}]") │ │ ❱ 436 │ dataset.make_buckets() │ │ 437 │ dataset.set_seed(seed) │ │ 438 │ │ 439 return DatasetGroup(datasets) │ │ │ │ D:\AI\kohya_ss\library\train_util.py:597 in make_buckets │ │ │ │ 594 │ │ print("loading image sizes.") │ │ 595 │ │ for info in tqdm(self.image_data.values()): │ │ 596 │ │ │ if info.image_size is None: │ │ ❱ 597 │ │ │ │ info.image_size = self.get_image_size(info.absolute_path) │ │ 598 │ │ │ │ 599 │ │ if self.enable_bucket: │ │ 600 │ │ │ print("make buckets") │ │ │ │ D:\AI\kohya_ss\library\train_util.py:823 in get_image_size │ │ │ │ 820 │ │ │ │ │ │ info.latents_flipped = latent │ │ 821 │ │ │ 822 │ def get_image_size(self, image_path): │ │ ❱ 823 │ │ image = Image.open(image_path) │ │ 824 │ │ return image.size │ │ 825 │ │ │ 826 │ def load_image_with_face_info(self, subset: BaseSubset, image_path: str): │ │ │ │ C:\Users\sasha\miniconda3\lib\site-packages\PIL\Image.py:3298 in open │ │ │ │ 3295 │ for message in accept_warnings: │ │ 3296 │ │ warnings.warn(message) │ │ 3297 │ msg = "cannot identify image file %r" % (filename if filename else fp) │ │ ❱ 3298 │ raise UnidentifiedImageError(msg) │ │ 3299 │ │ 3300 │ │ 3301 # │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ UnidentifiedImageError: cannot identify image file 'D:\AI\LoRA\works\BetterCallSaul\img\50_BetterCallSaul\BetterCallSaul (1).jpg' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\Users\sasha\miniconda3\lib\runpy.py:196 in _run_module_as_main │ │ │ │ 193 │ main_globals = sys.modules["main"].dict │ │ 194 │ if alter_argv: │ │ 195 │ │ sys.argv[0] = mod_spec.origin │ │ ❱ 196 │ return _run_code(code, main_globals, None, │ │ 197 │ │ │ │ │ "main", mod_spec) │ │ 198 │ │ 199 def run_module(mod_name, init_globals=None, │ │ │ │ C:\Users\sasha\miniconda3\lib\runpy.py:86 in _run_code │ │ │ │ 83 │ │ │ │ │ loader = loader, │ │ 84 │ │ │ │ │ package = pkg_name, │ │ 85 │ │ │ │ │ spec = mod_spec) │ │ ❱ 86 │ exec(code, run_globals) │ │ 87 │ return run_globals │ │ 88 │ │ 89 def _run_module_code(code, init_globals=None, │ │ │ │ in :7 │ │ │ │ 4 from accelerate.commands.accelerate_cli import main │ │ 5 if name == 'main': │ │ 6 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 7 │ sys.exit(main()) │ │ 8 │ │ │ │ C:\Users\sasha\miniconda3\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ C:\Users\sasha\miniconda3\lib\site-packages\accelerate\commands\launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ C:\Users\sasha\miniconda3\lib\site-packages\accelerate\commands\launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['C:\Users\sasha\miniconda3\python.exe', 'train_network.py', '--v2', '--v_parameterization', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI/stable-diffusion-webui/models/Stable-diffusion/2.1_SD2.1_768.safetensors', '--train_data_dir=D:/AI/LoRA/works/BetterCallSaul/img', '--resolution=768,768', '--output_dir=D:/AI/LoRA/works/BetterCallSaul/model', '--logging_dir=D:/AI/LoRA/works/BetterCallSaul/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=21BCS', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=linear', '--train_batch_size=1', '--max_train_steps=2700', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

File config: 2.1_LoRA.txt

bmaltais commented 1 year ago

Humm ... Look like Kohya latest code update might have an issue... Revert to the previous release until this is fixed... Sorry about that...

bmaltais commented 1 year ago

I tested the latest release and I have no problem training with it. I don't understand what is happening with your version. Have you tried reverting back to the previous release to see if training work again?

martianunlimited commented 1 year ago

The error seems to be raised by Pillow, if I was a betting man, i would say that D:\AI\LoRA\works\BetterCallSaul\img\50_BetterCallSaul\BetterCallSaul (1).jpg is corrupted.

rushuna86 commented 1 year ago

yeah i updated and getting great performance. 512,640 training with all default learning rate to test, even with batch size of 1 getting 5-6it/sec. only annoying thing is the triton error every time it saves after each epoch. But I can bare with that when it's 2x the speed. Did the upgrade and replaced Cudnn with 8.9 libs. I'm on a 4090.

sashaok123 commented 1 year ago

Yes, the problem was in the images, I changed the folder, but now another error

sashaok123 commented 1 year ago

CUDA SETUP: Loading binary D:\AI\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit AdamW optimizer | {} running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 2810 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 2810 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 2810 steps: 0%| | 0/2810 [00:00<?, ?it/s]epoch 1/1 A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton'

sashaok123 commented 1 year ago

In general, problems arise when training SD 1.5, but there is no such error on SD 2.1.

Traceback (most recent call last): File "D:\AI\kohya_ss\train_network.py", line 773, in train(args) File "D:\AI\kohya_ss\train_network.py", line 152, in train textencoder, vae, unet, = train_util.load_target_model( File "D:\AI\kohya_ss\library\train_util.py", line 2799, in load_target_model text_encoder = pipe.text_encoder UnboundLocalError: local variable 'pipe' referenced before assignment Traceback (most recent call last): File "C:\Users\sasha\miniconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\sasha\miniconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors', '--train_data_dir=D:/AI/kohya_ss/works/DeathStranding/img', '--resolution=512,512', '--output_dir=D:/AI/kohya_ss/works/DeathStranding/model', '--logging_dir=D:/AI/kohya_ss/works/DeathStranding/log', '--network_alpha=256', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=256', '--output_name=15DS', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=linear', '--train_batch_size=2', '--max_train_steps=1405', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

sashaok123 commented 1 year ago

The error you are encountering is an UnboundLocalError, which means that a local variable is being referenced before it has been assigned a value. In your case, the error occurs in the 'train_util.py' file at line 2799:

text_encoder = pipe.text_encoder The variable 'pipe' is being referenced before it has been assigned a value. To fix this error, you need to ensure that 'pipe' is defined before it is used in the code. You may need to look through the 'train_util.py' file and make sure the 'pipe' variable is correctly defined and assigned a value before line 2799.

Additionally, it seems that the 'train_network.py' script is being called by an 'accelerate.exe' script, and the subprocess is returning a non-zero exit status. This indicates that there is an error occurring in the 'train_network.py' script execution. Fixing the UnboundLocalError might resolve this issue, but if it persists, you'll need to debug the 'train_network.py' script further.

rushuna86 commented 1 year ago

weird i havent come across any errors yet except for the missing triton which can be ignored. i'm doing all my training on 1.5 models.

sashaok123 commented 1 year ago

weird i havent come across any errors yet except for the missing triton which can be ignored. i'm doing all my training on 1.5 models.

And I spend on two models at once, because people on Civitai prefer SD 1.5, and I prefer SD 2.1

sashaok123 commented 1 year ago

`Validating that requirements are satisfied. All requirements satisfied. Load CSS... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Loading config... Folder 10_DeathStranding: 281 images found Folder 10_DeathStranding: 2810 steps max_train_steps = 1405 stop_text_encoder_training = 0 lr_warmup_steps = 0 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors" --train_data_dir="D:/AI/kohya_ss/works/DeathStranding/img" --resolution=512,512 --output_dir="D:/AI/kohya_ss/works/DeathStranding/model" --logging_dir="D:/AI/kohya_ss/works/DeathStranding/log" --network_alpha="256" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=256 --output_name="15DS" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="linear" --train_batch_size="2" --max_train_steps="1405" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' prepare tokenizer Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████| 961k/961k [00:00<00:00, 1.40MB/s] Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████| 525k/525k [00:00<00:00, 1.01MB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████| 389/389 [00:00<?, ?B/s] Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████| 905/905 [00:00<00:00, 904kB/s] Use DreamBooth method. prepare images. found directory D:\AI\kohya_ss\works\DeathStranding\img\10_DeathStranding contains 281 image files 2810 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "D:\AI\kohya_ss\works\DeathStranding\img\10_DeathStranding" image_count: 281 num_repeats: 10 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: DeathStranding caption_extension: .txt

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████| 281/281 [00:00<00:00, 9062.80it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (384, 576), count: 20 bucket 1: resolution (512, 320), count: 20 bucket 2: resolution (512, 448), count: 10 bucket 3: resolution (512, 512), count: 10 bucket 4: resolution (576, 320), count: 10 bucket 5: resolution (576, 384), count: 40 bucket 6: resolution (640, 384), count: 2680 bucket 7: resolution (704, 320), count: 10 bucket 8: resolution (768, 320), count: 10 mean ar error (without repeats): 0.10572933173203725 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load Diffusers pretrained models model is not found as a file or in Hugging Face, perhaps file name is wrong? / 指定したモデル名のファイル、またはHugging Faceのモデルが見つかりません。ファイル名が誤っているかもしれません: D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors Traceback (most recent call last): File "D:\AI\kohya_ss\train_network.py", line 773, in train(args) File "D:\AI\kohya_ss\train_network.py", line 152, in train textencoder, vae, unet, = train_util.load_target_model( File "D:\AI\kohya_ss\library\train_util.py", line 2799, in load_target_model text_encoder = pipe.text_encoder UnboundLocalError: local variable 'pipe' referenced before assignment Traceback (most recent call last): File "C:\Users\sasha\miniconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\sasha\miniconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors', '--train_data_dir=D:/AI/kohya_ss/works/DeathStranding/img', '--resolution=512,512', '--output_dir=D:/AI/kohya_ss/works/DeathStranding/model', '--logging_dir=D:/AI/kohya_ss/works/DeathStranding/log', '--network_alpha=256', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=256', '--output_name=15DS', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=linear', '--train_batch_size=2', '--max_train_steps=1405', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.`

sashaok123 commented 1 year ago

accidentally closed

bmaltais commented 1 year ago

You could try to install torch 1.12.1 again and see if it make a difference. Kohya does not support torch 2 and me adding support for it at installation time is experimental.

sashaok123 commented 1 year ago

You could try to install torch 1.13.1 again and see if it make a difference. Kohya does not support torch 2 and me adding support for it at installation time is experimental.

Yes, it really is. But learning on Torch 2 in SD 2.1 is much faster!

DKnight54 commented 1 year ago

@sashaok123, sorry to pop in on a different topic, but it seems like you've had successes with training loras on SD2.1 while using FP16. I (and several others) have been getting Loss=nan during Lora training when using FP16.

Can you please guide us how to do it?

sashaok123 commented 1 year ago

Off 8bit Adam

ср, 19 апр. 2023 г., 19:49 DKnight54 @.***>:

@sashaok123 https://github.com/sashaok123, sorry to pop in on a different topic, but it seems like you've had successes with training loras on SD2.1 while using FP16. I (and several others) have been getting Loss=nan during Lora training when using FP16 https://github.com/kohya-ss/sd-scripts/issues/385.

Can you please guide us how to do it?

— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/634#issuecomment-1514771476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGTRT7GPCAM6AONC2AW4FMLXB7UOJANCNFSM6AAAAAAXCFUZ64 . You are receiving this because you were mentioned.Message ID: @.***>

DKnight54 commented 1 year ago

`Validating that requirements are satisfied. All requirements satisfied. Load CSS... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Loading config... Folder 10_DeathStranding: 281 images found Folder 10_DeathStranding: 2810 steps max_train_steps = 1405 stop_text_encoder_training = 0 lr_warmup_steps = 0 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors" --train_data_dir="D:/AI/kohya_ss/works/DeathStranding/img" --resolution=512,512 --output_dir="D:/AI/kohya_ss/works/DeathStranding/model" --logging_dir="D:/AI/kohya_ss/works/DeathStranding/log" --network_alpha="256" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=256 --output_name="15DS" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="linear" --train_batch_size="2" --max_train_steps="1405" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' prepare tokenizer Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████| 961k/961k [00:00<00:00, 1.40MB/s] Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████| 525k/525k [00:00<00:00, 1.01MB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████| 389/389 [00:00<?, ?B/s] Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████| 905/905 [00:00<00:00, 904kB/s] Use DreamBooth method. prepare images. found directory D:\AI\kohya_ss\works\DeathStranding\img\10_DeathStranding contains 281 image files 2810 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "D:\AI\kohya_ss\works\DeathStranding\img\10_DeathStranding" image_count: 281 num_repeats: 10 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: DeathStranding caption_extension: .txt

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████| 281/281 [00:00<00:00, 9062.80it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (384, 576), count: 20 bucket 1: resolution (512, 320), count: 20 bucket 2: resolution (512, 448), count: 10 bucket 3: resolution (512, 512), count: 10 bucket 4: resolution (576, 320), count: 10 bucket 5: resolution (576, 384), count: 40 bucket 6: resolution (640, 384), count: 2680 bucket 7: resolution (704, 320), count: 10 bucket 8: resolution (768, 320), count: 10 mean ar error (without repeats): 0.10572933173203725 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load Diffusers pretrained models model is not found as a file or in Hugging Face, perhaps file name is wrong? / 指定したモデル名のファイル、またはHugging Faceのモデルが見つかりません。ファイル名が誤っているかもしれません: D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors Traceback (most recent call last): File "D:\AI\kohya_ss\train_network.py", line 773, in train(args) File "D:\AI\kohya_ss\train_network.py", line 152, in train textencoder, vae, unet, = train_util.load_target_model( File "D:\AI\kohya_ss\library\train_util.py", line 2799, in load_target_model text_encoder = pipe.text_encoder UnboundLocalError: local variable 'pipe' referenced before assignment Traceback (most recent call last): File "C:\Users\sasha\miniconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\sasha\miniconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI\kohya_ss\venv\Scripts\accelerate.exemain.py", line 7, in File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI/stable-diffusion-webui-Torch2/models/Stable-diffusion/1.5_SD1.5_Base.safetensors', '--train_data_dir=D:/AI/kohya_ss/works/DeathStranding/img', '--resolution=512,512', '--output_dir=D:/AI/kohya_ss/works/DeathStranding/model', '--logging_dir=D:/AI/kohya_ss/works/DeathStranding/log', '--network_alpha=256', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=256', '--output_name=15DS', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=linear', '--train_batch_size=2', '--max_train_steps=1405', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.`

Looking at the error, it seems like it could not find the base model file. Is the path correct?

megane42 commented 1 year ago

WORKAROUND: Stop choosing pretrained model from dropdown UI and input the filepath explicitly.

The problem occurs at this line. This line tries to define pipe but it failed. https://github.com/bmaltais/kohya_ss/blob/63657088f4c35a376dd8a936f53e9b9a3b4b1168/library/train_util.py#L2794

It seems that this line downloads something 19 requirements, but it fails before completing downloads somehow.

loading model for process 0/1
load Diffusers pretrained models
text_encoder\model.safetensors not found
Fetching 19 files:  74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                        | 14/19 [01:10<00:25,  5.05s/it]
model is not found as a file or in Hugging Face, perhaps file name is wrong? / 指定したモデル名のファイル、またはHugging Faceのモデルが見つかりません。ファイル名が誤っているかもしれません: runwayml/stable-diffusion-v1-5
Downloading (…)on_pytorch_model.bin:  50%|████████████████████████████████████████████████████████████████▉                                                                | 1.73G/3.44G [01:09<01:09, 24.7MB/s]
Traceback (most recent call last):
  File "C:\kohya_ss\train_network.py", line 773, in <module>
    train(args)
  File "C:\kohya_ss\train_network.py", line 152, in train
    text_encoder, vae, unet, _ = train_util.load_target_model(
  File "C:\kohya_ss\library\train_util.py", line 2799, in load_target_model
    text_encoder = pipe.text_encoder
UnboundLocalError: local variable 'pipe' referenced before assignment

Specifying the pretrained model path explicitly bypass this line because there is the if statement like this: https://github.com/bmaltais/kohya_ss/blob/63657088f4c35a376dd8a936f53e9b9a3b4b1168/library/train_util.py#L2786-L2787