aigc-apps / sd-webui-EasyPhoto

📷 EasyPhoto | Your Smart AI Photo Generator.
Apache License 2.0
4.94k stars 389 forks source link

训练800步,到最后的时候报错。 #107

Open kco0910 opened 1 year ago

kco0910 commented 1 year ago

硬件环境: 系统:ubuntu 22.04 CUDA:11.7 显卡:GeForce RTX 3090 Ti python版本:3.9 Traceback (most recent call last): File "/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1441, in main() File "/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1397, in main t_result_list, tlist, scores = eval_jpg_with_faceid(pivot_dir, os.path.join(args.output_dir, "validation")) File "/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 338, in eval_jpg_with_faceid embedding_array = np.vstack(embedding_list) File "<__array_function__ internals>", line 180, in vstack File "/data/python/fj/py39/lib/python3.9/site-packages/numpy/core/shape_base.py", line 282, in vstack return _nx.concatenate(arrs, 0) File "<__array_function__ internals>", line 180, in concatenate ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 512 and the array at index 4 has size 1 Steps: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 800/800 [17:27<00:00, 1.31s/it, lr=5e-5, step_loss=0.00243] Traceback (most recent call last): File "/home/dj/micromamba/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dj/micromamba/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 989, in main() File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 985, in main launch_command(args) File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 979, in launch_command simple_launcher(args) File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/data/python/fj/py39/bin/python3', '/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--logging_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=ldh', '--cache_log_file=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/train_kohya_log.txt', '--validation']' returned non-zero exit status 1. Error executing the command: Command '['/data/python/fj/py39/bin/python3', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', '/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--logging_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=ldh', '--cache_log_file=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.

yunkchen commented 1 year ago

看上去模型训练是完成了,最后验证出图报错,但是上面看不出具体报错原因; 建议试一下生成图会不会报错

kco0910 commented 1 year ago

最后那步应该是写入配置或做什么操作,在Inference界面中找不到训练的User id

wuziheng commented 1 year ago

@kco0910 推理界面刷新后,有userid么?可以在群里持续交流,或者在这个ISSUE下沟通。

hjcenry commented 8 months ago

相同错误,最后有报错

2024-02-02 12:46:05,404 - modelscope - INFO - model inference done 2024-02-02 12:46:06,057 - modelscope - INFO - model inference done 2024-02-02 12:46:06,131 - modelscope - INFO - model inference done 2024-02-02 12:46:06,854 - modelscope - INFO - model inference done 2024-02-02 12:46:06,925 - modelscope - INFO - model inference done 2024-02-02 12:46:06,999 - modelscope - INFO - model inference done 2024-02-02 12:46:07,171 - modelscope - INFO - model inference done 2024-02-02 12:46:07,339 - modelscope - INFO - model inference done 2024-02-02 12:46:07,509 - modelscope - INFO - model inference done 2024-02-02 12:46:07,677 - modelscope - INFO - model inference done 2024-02-02 12:46:07,846 - modelscope - INFO - model inference done 2024-02-02 12:46:07,920 - modelscope - INFO - model inference done 2024-02-02 12:46:07,997 - modelscope - INFO - model inference done 2024-02-02 12:46:08,072 - modelscope - INFO - model inference done 2024-02-02 12:46:08,240 - modelscope - INFO - model inference done 2024-02-02 12:46:08,416 - modelscope - INFO - model inference done 2024-02-02 12:46:08,499 - modelscope - INFO - model inference done 2024-02-02 12:46:08,576 - modelscope - INFO - model inference done 2024-02-02 12:46:08,650 - modelscope - INFO - model inference done 2024-02-02 12:46:08,722 - modelscope - INFO - model inference done 2024-02-02 12:46:08,796 - modelscope - INFO - model inference done 2024-02-02 12:46:08,967 - modelscope - INFO - model inference done 2024-02-02 12:46:09,137 - modelscope - INFO - model inference done 2024-02-02 12:46:09,310 - modelscope - INFO - model inference done 2024-02-02 12:46:09,481 - modelscope - INFO - model inference done 2024-02-02 12:46:09,652 - modelscope - INFO - model inference done 2024-02-02 12:46:09,728 - modelscope - INFO - model inference done 2024-02-02 12:46:09,801 - modelscope - INFO - model inference done 2024-02-02 12:46:09,877 - modelscope - INFO - model inference done Dectect no face in training data, move last weights and validation image to best_outputs Traceback (most recent call last): File "D:\stable diffusion\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in main() File "D:\stable diffusion\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper result = func(*args, **kwargs) File "D:\stable diffusion\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1362, in main copyfile(t_result_list[0][1], os.path.join(best_outputs_dir, os.path.basename(t_result_list[0][1]))) IndexError: list index out of range Steps: 100%|█████████████████████████████████████████████| 800/800 [26:06<00:00, 1.96s/it, lr=5e-5, step_loss=0.00529] Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in main() File "D:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main launch_command(args) File "D:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 979, in launch_command simple_launcher(args) File "D:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\stable diffusion\stable-diffusion-webui\venv\Scripts\python.exe', 'D:\stable diffusion\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\hejincheng\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=614489', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\hejincheng\user_weights', '--logging_dir=outputs\easyphoto-user-id-infos\hejincheng\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=hejincheng', '--cache_log_file=D:\stable diffusion\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1. 2024-02-02 12:46:12,445 - EasyPhoto - Error executing the command: Command '['D:\stable diffusion\stable-diffusion-webui\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'D:\stable diffusion\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\hejincheng\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=614489', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\hejincheng\user_weights', '--logging_dir=outputs\easyphoto-user-id-infos\hejincheng\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=hejincheng', '--cache_log_file=D:\stable diffusion\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1. Applying attention optimization: xformers... done.

KennyChan3389 commented 8 months ago

Dectect no face in training data, move last weights and validation image to best_outputs.和你们一样,有提示数据集中没有人脸,我自己都懵逼了,能训练完毕,输出文件夹有9个类似checkpoint-800.safetensors这样命名的文件,但是就是没有user_id。

fyp711 commented 7 months ago

我也遇到了相同的错误

fyp711 commented 7 months ago

硬件环境: 系统:ubuntu 22.04 CUDA:11.7 显卡:GeForce RTX 3090 Ti python版本:3.9 Traceback (most recent call last): File "/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1441, in main() File "/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1397, in main t_result_list, tlist, scores = eval_jpg_with_faceid(pivot_dir, os.path.join(args.output_dir, "validation")) File "/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 338, in eval_jpg_with_faceid embedding_array = np.vstack(embedding_list) File "<array_function internals>", line 180, in vstack File "/data/python/fj/py39/lib/python3.9/site-packages/numpy/core/shape_base.py", line 282, in vstack return _nx.concatenate(arrs, 0) File "<array_function internals>", line 180, in concatenate ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 512 and the array at index 4 has size 1 Steps: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 800/800 [17:27<00:00, 1.31s/it, lr=5e-5, step_loss=0.00243] Traceback (most recent call last): File "/home/dj/micromamba/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dj/micromamba/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 989, in main() File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 985, in main launch_command(args) File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 979, in launch_command simple_launcher(args) File "/data/python/fj/py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/data/python/fj/py39/bin/python3', '/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--logging_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=ldh', '--cache_log_file=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/train_kohya_log.txt', '--validation']' returned non-zero exit status 1. Error executing the command: Command '['/data/python/fj/py39/bin/python3', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', '/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--logging_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/outputs/easyphoto-user-id-infos/ldh/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=ldh', '--cache_log_file=/data/python/fj/stable-diffusion-webui/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.

请问解决了吗?

Yuhyeong commented 7 months ago

加一,代码有问题,去年九月的issue到现在连回复都么有


│ /data/webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py:1362 in   │
│ main                                                                                             │
│                                                                                                  │
│   1359 │   │   │   │   │   t_result_list.append([res, img])                                      │
│   1360 │   │   │   │   │   t_result_list = sorted(t_result_list, key=lambda a: -a[0])            │
│   1361 │   │   │   │                                                                             │
│ ❱ 1362 │   │   │   │   copyfile(t_result_list[0][1], os.path.join(best_outputs_dir, os.path.bas  │
│   1363 │   │   │   │   copyfile(                                                                 │
│   1364 │   │   │   │   │   os.path.join(args.output_dir, "pytorch_lora_weights.safetensors"),    │
│   1365 │   │   │   │   │   os.path.join(best_outputs_dir, merge_best_lora_name + ".safetensors"  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range