bmaltais / kohya_ss

Apache License 2.0
9.44k stars 1.22k forks source link

doesn't start training #2334

Open dron28 opened 5 months ago

dron28 commented 5 months ago

Loading settings from ./outputs/tmpfilelora.toml... train_util.py:3744 Traceback (most recent call last): File "X:\Kohya_ss-GUI-LoRA-Portable-main\sd-scripts\sdxl_train_network.py", line 182, in <module> args = train_util.read_config_from_file(args, parser) File "X:\Kohya_ss-GUI-LoRA-Portable-main\sd-scripts\library\train_util.py", line 3746, in read_config_from_file config_dict = toml.load(f) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\toml\decoder.py", line 156, in load return loads(f.read(), _dict, decoder) File "X:\Kohya_ss-GUI-LoRA-Portable-main\python\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 1036: invalid continuation byte Traceback (most recent call last): File "X:\Kohya_ss-GUI-LoRA-Portable-main\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "X:\Kohya_ss-GUI-LoRA-Portable-main\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['X:\\Kohya_ss-GUI-LoRA-Portable-main\\venv\\Scripts\\python.exe', 'X:/Kohya_ss-GUI-LoRA-Portable-main/sd-scripts/sdxl_train_network.py', '--config_file', './outputs/tmpfilelora.toml']' returned non-zero exit status 1. 02:24:50-685883 INFO Training has ended.

bmaltais commented 5 months ago

Something in the toml that is outputted in the outputs folder is causing the issue... can you share the toml?

lovezyz commented 5 months ago

I also have this prombl tmpfilelora.zip e

lovezyz commented 5 months ago

图片

bmaltais commented 5 months ago

image

That file path is causing the issue... Not sure how to fix this... I think it is a bug in sd-scripts... nothing I can do... or maybe this is an encoding issue? I will need to do some research...

lovezyz commented 5 months ago

This problem should be caused by the encoding of the generated intermediate file tmpfilelora.toml which is not utf-8. Can you look at the logic of outputting the tmpfilelora.toml file?

bmaltais commented 5 months ago

I pushed an updated version that write the toml as utf-8... hope this help:

git fetch origin
git checkout dev
git pull

Let me know how it goes.

lovezyz commented 5 months ago

トレーニングは通常通り行えます、ありがとうございます

dron28 commented 5 months ago

I changed the link for downloading/uploading images/models, now at least it has started caching images.(default settings work sd1.5) File "X:\Kohya_ss-GUI-LoRA-Portable-main\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "X:\Kohya_ss-GUI-LoRA-Portable-main\sd-scripts\train_network.py", line 293, in train network, _ = network_module.create_network_from_weights(1, args.network_weights, vae, text_encoder, unet, **net_kwargs) File "X:\Kohya_ss-GUI-LoRA-Portable-main\sd-scripts\networks\lora.py", line 703, in create_network_from_weights if os.path.splitext(file)[1] == ".safetensors": File "X:\Kohya_ss-GUI-LoRA-Portable-main\python\lib\ntpath.py", line 230, in splitext p = os.fspath(p) TypeError: expected str, bytes or os.PathLike object, not NoneType Traceback (most recent call last): File "X:\Kohya_ss-GUI-LoRA-Portable-main\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "X:\Kohya_ss-GUI-LoRA-Portable-main\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "X:\Kohya_ss-GUI-LoRA-Portable-main\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['X:\\Kohya_ss-GUI-LoRA-Portable-main\\venv\\Scripts\\python.exe', 'X:/Kohya_ss-GUI-LoRA-Portable-main/sd-scripts/sdxl_train_network.py', '--config_file', './outputs/tmpfilelora.toml']' returned non-zero exit status 1.

bucket_no_upscale = true bucket_reso_steps = 64 cache_latents = true caption_extension = ".txt" clip_skip = 2 dim_from_weights = true dynamo_backend = "no" enable_bucket = true epoch = 1 gradient_accumulation_steps = 1 gradient_checkpointing = true huber_c = 0.1 huber_schedule = "snr" learning_rate = 0.0001 loss_type = "l2" lr_scheduler = "constant_with_warmup" lr_scheduler_args = [] lr_scheduler_num_cycles = 1 lr_scheduler_power = 1 lr_warmup_steps = 251 max_bucket_reso = 4096 max_data_loader_n_workers = 1 max_grad_norm = 1 max_timestep = 1000 max_token_length = 225 max_train_steps = 2508 min_bucket_reso = 64 mixed_precision = "bf16" multires_noise_discount = 0.8 multires_noise_iterations = 6 network_alpha = 128 network_args = [] network_dim = 128 network_module = "networks.lora" no_half_vae = true noise_offset_type = "Original" optimizer_args = [] optimizer_type = "AdamW8bit" output_dir = "X:/" output_name = "akane(sdxl)" pretrained_model_name_or_path = "X:/forge/webui/models/Stable-diffusion/animagineXLV31_v31.safetensors" prior_loss_weight = 1 resolution = "1024,1024" sample_prompts = "X:/prompt.txt" sample_sampler = "euler_a" save_every_n_epochs = 1 save_model_as = "safetensors" save_precision = "bf16" seed = 1 text_encoder_lr = 5e-5 train_batch_size = 1 train_data_dir = "X:/train" training_comment = "akane" unet_lr = 0.0001 vae = "X:/forge/webui/models/VAE/sdxl_vae.safetensors" wandb_api_key = "False" xformers = true

dron28 commented 5 months ago

Disabling "DIM from weights" helped