bmaltais / kohya_ss

Apache License 2.0
8.91k stars 1.16k forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 1036: invalid start byte #2331

Open gnilix opened 3 months ago

gnilix commented 3 months ago

2024-04-18 20:46:05 INFO Loading settings from ./outputs/tmpfilelora.toml... train_util.py:3744 Traceback (most recent call last): File "D:\111lora\kohya_ss\sd-scripts\sdxl_train_network.py", line 182, in args = train_util.read_config_from_file(args, parser) File "D:\111lora\kohya_ss\sd-scripts\library\train_util.py", line 3746, in read_config_from_file config_dict = toml.load(f) File "D:\111lora\kohya_ss\venv\lib\site-packages\toml\decoder.py", line 156, in load return loads(f.read(), _dict, decoder) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64qbz5n2kfra8p0\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 1036: invalid start byte Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\111lora\kohya_ss\venv\Scripts\accelerate.exe\main__.py", line 7, in File "D:\111lora\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\111lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\111lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\111lora\kohya_ss\venv\Scripts\python.exe', 'D:/111lora/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', './outputs/tmpfilelora.toml', '--network_train_unet_only']' returned non-zero exit status 1. 20:46:07-309425 INFO Training has ended.

bmaltais commented 3 months ago

can you share the toml file from the outputs folder? I think there is a character in there kohya's sd-scripts doe not like.

gnilix commented 3 months ago

can you share the toml file from the outputs folder? I think there is a character in there kohya's sd-scripts doe not like. The TOML file is generated by the script after importing the previous JSON configuration file, and it can be trained with the same parameters in version 23.1.5…

—————————————————————————————————————— bucket_reso_steps = 32 cache_latents = true cache_latents_to_disk = true caption_dropout_every_n_epochs = 0 caption_dropout_rate = 0 caption_extension = ".txt" clip_skip = 1 dynamo_backend = "no" enable_bucket = true epoch = 10 gradient_accumulation_steps = 1 huber_c = 0.1 huber_schedule = "snr" keep_tokens = 2 learning_rate = 0.0001 logging_dir = "D:/111lora/kohya_ss/logs" loss_type = "smooth_l1" lr_scheduler = "cosine" lr_scheduler_args = [] lr_scheduler_num_cycles = 1 lr_scheduler_power = 1 lr_warmup_steps = 511 max_bucket_reso = 2048 max_data_loader_n_workers = 0 max_grad_norm = 1 max_timestep = 1000 max_token_length = 75 max_train_epochs = 1600 max_train_steps = 5110 min_bucket_reso = 256 mixed_precision = "fp16" multires_noise_discount = 0 network_alpha = 1 network_args = [] network_dim = 8 network_dropout = 0 network_module = "networks.lora" noise_offset_type = "Original" optimizer_type = "Lion" optimizer_args = [] output_dir = "D:\111lora\1" output_name = "知更鸟koh" persistent_data_loader_workers = true pretrained_model_name_or_path = "D:/1/models/Stable-diffusion/xl/kohakuXLDelta_rev1.safetensors" prior_loss_weight = 0.25 resolution = "512,512" sample_every_n_epochs = 1 sample_prompts = "D:\111lora\1\prompt.txt" sample_sampler = "euler_a" save_every_n_epochs = 1 save_model_as = "safetensors" save_precision = "fp16" scale_weight_norms = 0 seed = 123 shuffle_caption = true text_encoder_lr = 2e-6 train_batch_size = 1 train_data_dir = "D:\data\星铁\知更鸟" unet_lr = 1e-5 xformers = true

gnilix commented 3 months ago

The TOML file is generated by the script after importing the previous JSON configuration file, and it can be trained with the same parameters in version 23.1.5…

bmaltais commented 3 months ago

OK... this is causing the issue:

output_name = "知更鸟koh" train_data_dir = "D:\data\星铁\知更鸟"

Try with the latest dev code... I implemented a utf-8 outout format that should help with this issue:

git fetch origin
git checkout dev
giit pull
5KilosOfCheese commented 2 months ago

See if this helps https://github.com/bmaltais/kohya_ss/discussions/1744

This is a thing which happens to many. The safe bet is that you remove ALL symbols and letters that aren't the standard English set. It can lead to cases where the system can't find a folder or a file or throws erros. It can be really frustrating to deal with, but anyone who speaks language other than english will still to this day deal with it on many programs and even sites.