Closed rain9726 closed 10 months ago
I also encountered the same problem
@rain9726 and @XMUykyz, It looks like the error was caused by the validation. Since we have added the exception handling for validation errors in the latest version. Could you please turn off the validation in the training UI or upgrade EasyPhoto to the latest version?
File "/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1237, in main
log_validation(
File "/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 123, in log_validation
image = pipeline(
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
From the line number in the log, I can confirm you are not using the latest EasyPhoto. https://github.com/aigc-apps/sd-webui-EasyPhoto/commit/41b68d6e5f1b13b523ad9599ebc42048f7327a13#diff-801b9d852d2dfd58f3feee7db12fdc9554f8b73e4c581952d05408a62eb6a507L1237.
Turn Off Validation on training UI and restart
更新了新版 好了 谢谢
024-01-09 08:43:20,825 - EasyPhoto - train_file_path : /root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py 2024-01-09 08:43:20,826 - EasyPhoto - cache_log_file_path: /root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-tmp/train_kohya_log.txt Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (8)The following values were not passed to
accelerate launch
and had defaults used instead:--num_processes
was set to a value of1
--num_machines
was set to a value of1
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. 2024-01-09 08:43:32,619 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2024-01-09 08:43:32,621 - modelscope - INFO - TensorFlow version 2.12.0 Found. 2024-01-09 08:43:32,621 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2024-01-09 08:43:32,650 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 ce52a1517bab79727e198f27c93177a5 and a total number of 943 components indexed 01/09/2024 08:43:33 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cudaMixed precision type: fp16
{'dynamic_thresholding_ratio', 'timestep_spacing', 'sample_max_value', 'clip_sample_range', 'thresholding', 'variance_type', 'rescale_betas_zero_snr', 'prediction_type'} was not found in config. Values will be initialized to default values. UNet2DConditionModel: 64, 8, 768, False, False loading u-net:
loading vae:
loading text encoder:
create LoRA network. base dim (rank): 128, alpha: 64
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 67230.31it/s]
Downloading and preparing dataset imagefolder/default to /root/.cache/huggingface/datasets/imagefolder/default-5f1bdc0016d9699e/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f...
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 65664.25it/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 68909.70it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 4628.11it/s]
Dataset imagefolder downloaded and prepared to /root/.cache/huggingface/datasets/imagefolder/default-5f1bdc0016d9699e/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f. Subsequent calls will reuse this data.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 805.36it/s]
01/09/2024 08:43:46 - INFO - main - Running training
01/09/2024 08:43:46 - INFO - main - Num examples = 15
01/09/2024 08:43:46 - INFO - main - Num Epochs = 200
01/09/2024 08:43:46 - INFO - main - Instantaneous batch size per device = 1
01/09/2024 08:43:46 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 4
01/09/2024 08:43:46 - INFO - main - Gradient Accumulation steps = 4
01/09/2024 08:43:46 - INFO - main - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/s]2024-01-09 08:43:46,805 - modelscope - INFO - Model revision not specified, use revision: v2.0.2
2024-01-09 08:43:48,800 - modelscope - INFO - initiate model from /root/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface
2024-01-09 08:43:48,800 - modelscope - INFO - initiate model from location /root/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface.
2024-01-09 08:43:48,802 - modelscope - WARNING - No preprocessor field found in cfg.
2024-01-09 08:43:48,802 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-01-09 08:43:48,802 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/root/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface'}. trying to build by task and model information.
2024-01-09 08:43:48,802 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor
2024-01-09 08:43:48,803 - modelscope - INFO - loading model from /root/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface/pytorch_model.pt
/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or
loading vae:
loading text encoder:
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint.StableDiffusionInpaintPipeline'> by passing
main()
File "/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/utils/gpu_info.py", line 190, in wrapper
result = func(*args, kwargs)
File "/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1237, in main
log_validation(
File "/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 123, in log_validation
image = pipeline(
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 1349, in call
noise_pred = self.unet(
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
TypeError: UNet2DConditionModel.forward() got an unexpected keyword argument 'added_cond_kwargs'
Steps: 12%|███████████████▍ | 100/800 [03:17<23:01, 1.97s/it, lr=5e-5, step_loss=0.148]
Traceback (most recent call last):
File "/root/miniconda3/envs/xl_env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/xl_env/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 989, in
main()
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 985, in main
launch_command(args)
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/root/miniconda3/envs/xl_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/xl_env/bin/python', '/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/root/autodl-tmp/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=782704', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights', '--logging_dir=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=22', '--cache_log_file=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.
2024-01-09 08:47:05,766 - EasyPhoto - Error executing the command: Command '['/root/miniconda3/envs/xl_env/bin/python', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', '/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/root/autodl-tmp/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=782704', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights', '--logging_dir=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=22', '--cache_log_file=/root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.
Applying attention optimization: xformers... done.
None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=None
. warnings.warn(msg) 2024-01-09 08:43:49,432 - modelscope - INFO - load model done Steps: 12%|███████████████▏ | 100/800 [03:03<18:26, 1.58s/it, lr=5e-5, step_loss=0.00655] saving checkpoint: /root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights/checkpoint-100.safetensors 01/09/2024 08:46:51 - INFO - main - Saved state to /root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights/checkpoint-100.safetensors, /root/autodl-tmp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/22/user_weights/checkpoint-100 Steps: 12%|███████████████▍ | 100/800 [03:04<18:26, 1.58s/it, lr=5e-5, step_loss=0.148]01/09/2024 08:46:51 - INFO - main - Running validation... Generating 4 images with prompt: easyphoto_face, easyphoto, 1person. UNet2DConditionModel: 64, 8, 768, False, False loading u-net:safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . You have loaded a UNet with 4 input channels which. {'dynamic_thresholding_ratio', 'timestep_spacing', 'sample_max_value', 'thresholding', 'algorithm_type', 'solver_type', 'lower_order_final', 'variance_type', 'use_karras_sigmas', 'euler_at_final', 'use_lu_lambdas', 'prediction_type', 'solver_order', 'lambda_min_clipped'} was not found in config. Values will be initialized to default values. Traceback (most recent call last): File "/root/autodl-tmp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1370, in请作者帮忙看看是什么问题引起的