aigc-apps / sd-webui-EasyPhoto

📷 EasyPhoto | Your Smart AI Photo Generator.
Apache License 2.0
4.88k stars 385 forks source link

modelscope issues? Failed to obtain Lora after training, please check the training process. #151

Open k8kiss opened 10 months ago

k8kiss commented 10 months ago

Hello training fails and I am seeing a lot of "modelscope" related warnings. I pasted below some portions of the log. I have updated Automatic1111 with following: version: v1.6.0  •  python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: 0.0.17  •  gradio: 3.41.2  •  checkpoint: 18d1c095b6

2023-10-09 01:11:01,193 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-10-09 01:11:01,195 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-10-09 01:11:01,195 - modelscope - INFO - Loading ast index from C:\Users**USER.cache\modelscope\ast_indexer 2023-10-09 01:11:01,230 - modelscope - INFO - Loading done! Current index file version is 1.8.4, with md5 c03d7f5c980f6e54d254d7718291a800 and a total number of 902 components indexed 2023-10-09 01:11:03,388 - modelscope - INFO - Use user-specified model revision: v1.0.3 2023-10-09 01:11:04,381 - modelscope - WARNING - ('PIPELINES', 'face_recognition', 'face_recognition') not found in ast index file 2023-10-09 01:11:04,381 - modelscope - INFO - initiate model from C:\Users\USER.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2023-10-09 01:11:04,381 - modelscope - INFO - initiate model from location C:\Users\USER.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition. 2023-10-09 01:11:04,383 - modelscope - INFO - initialize model from C:\Users\USER.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2023-10-09 01:11:04,414 - modelscope - WARNING - ('MODELS', 'face_recognition', 'face_recognition') not found in ast index file 2023-10-09 01:11:05,374 - modelscope - INFO - Model revision not specified, use the latest revision: v2.0.2 2023-10-09 01:11:07,511 - modelscope - INFO - initiate model from C:\Users\USER.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-10-09 01:11:07,511 - modelscope - INFO - initiate model from location C:\Users\USER.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-10-09 01:11:07,516 - modelscope - WARNING - No preprocessor field found in cfg. 2023-10-09 01:11:07,516 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-10-09 01:11:07,516 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\USER\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-10-09 01:11:07,516 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-10-09 01:11:07,517 - modelscope - INFO - loading model from C:\Users\USER.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt D:\sda1111\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( D:\sda1111\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=None. warnings.warn(msg) 2023-10-09 01:11:07,957 - modelscope - INFO - load model done 2023-10-09 01:11:08,614 - modelscope - INFO - load facefusion models done 2023-10-09 01:11:08,614 - modelscope - INFO - init done 2023-10-09 01:11:08,617 - modelscope - WARNING - No preprocessor field found in cfg. 2023-10-09 01:11:08,617 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-10-09 01:11:08,617 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\USER\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition'}. trying to build by task and model information. 2023-10-09 01:11:08,617 - modelscope - WARNING - No preprocessor key ('face_recognition', 'face_recognition') found in PREPROCESSOR_MAP, skip building preprocessor. 2023-10-09 01:11:08,619 - modelscope - INFO - image face recognition model init done 2023-10-09 01:11:09,600 - modelscope - INFO - Use user-specified model revision: v2.0.2 2023-10-09 01:11:12,023 - modelscope - INFO - initiate model from C:\Users\USER.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-10-09 01:11:12,023 - modelscope - INFO - initiate model from location C:\Users\USER.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-10-09 01:11:12,026 - modelscope - WARNING - No preprocessor field found in cfg. 2023-10-09 01:11:12,026 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-10-09 01:11:12,027 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\USER\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-10-09 01:11:12,027 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-10-09 01:11:12,028 - modelscope - INFO - loading model from C:\Users\USER.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt 2023-10-09 01:11:12,335 - modelscope - INFO - load model done 2023-10-09 01:11:13,315 - modelscope - INFO - Use user-specified model revision: v1.0.0 2023-10-09 01:11:13,741 - modelscope - INFO - initiate model from C:\Users\USER.cache\modelscope\hub\damo\cv_u2net_salient-detection 2023-10-09 01:11:13,741 - modelscope - INFO - initiate model from location C:\Users\USER.cache\modelscope\hub\damo\cv_u2net_salient-detection. 2023-10-09 01:11:13,743 - modelscope - INFO - initialize model from C:\Users\USER**.cache\modelscope\hub\damo\cv_u2net_s

wuziheng commented 10 months ago

sorry,the log relate to modelscope seems to be WARNING and INFO, you may provide more details about the training function stack

k8kiss commented 10 months ago

Thanks for your response, so you don't think that's why it failed? The log on A1111 seems empty. Any other place where I can get more details?

wuziheng commented 10 months ago
  1. close modelscope INFO & WARNING, you could add code below after import modelscope by yourself

from modelscope.utils.logger import get_logger logger = get_logger() logger.setLevel(logging.ERROR)

  1. I mean i haven't found the truely error in your log, if that‘s all, your could provide snapshot or your terminal screen,. Help us to find the BUG
Patholog-CZ commented 5 months ago

I got the same error after I finished training the model. Steps: 100%|█████████████████████████████████████████████| 800/800 [36:27<00:00, 2.63s/it, lr=5e-5, step_loss=0.00327] saving checkpoint: outputs\easyphoto-user-id-infos\Lea_1\user_weights\checkpoint-800.safetensors 03/06/2024 21:23:54 - INFO - __main__ - Saved state to outputs\easyphoto-user-id-infos\Lea_1\user_weights\checkpoint-800.safetensors, outputs\easyphoto-user-id-infos\Lea_1\user_weights\checkpoint-800 Steps: 100%|█████████████████████████████████████████████| 800/800 [36:28<00:00, 2.63s/it, lr=5e-5, step_loss=0.00271] saving checkpoint: outputs\easyphoto-user-id-infos\Lea_1\user_weights\pytorch_lora_weights.safetensors UNet2DConditionModel: 64, 8, 768, False, False HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" loading u-net: <All keys matched successfully> loading vae: <All keys matched successfully> loading text encoder: <All keys matched successfully> HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint.StableDiffusionInpaintPipeline'> by passingsafety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . You have loaded a UNet with 4 input channels which. {'variance_type', 'sample_max_value', 'timestep_spacing', 'solver_order', 'euler_at_final', 'prediction_type', 'lambda_min_clipped', 'use_karras_sigmas', 'dynamic_thresholding_ratio', 'lower_order_final', 'use_lu_lambdas', 'final_sigmas_type', 'algorithm_type', 'thresholding', 'solver_type'} was not found in config. Values will be initialized to default values. 03/06/2024 21:24:03 - INFO - __main__ - Running validation error, skip it.Error info: UNet2DConditionModel.forward() got an unexpected keyword argument 'added_cond_kwargs'. HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-03-06 21:24:06,141 - modelscope - INFO - Use user-specified model revision: v1.0.3 2024-03-06 21:24:06,667 - modelscope - WARNING - ('PIPELINES', 'face_recognition', 'face_recognition') not found in ast index file 2024-03-06 21:24:06,668 - modelscope - INFO - initiate model from C:\Users\patho\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2024-03-06 21:24:06,668 - modelscope - INFO - initiate model from location C:\Users\patho\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition. 2024-03-06 21:24:06,671 - modelscope - INFO - initialize model from C:\Users\patho\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2024-03-06 21:24:06,676 - modelscope - WARNING - ('MODELS', 'face_recognition', 'face_recognition') not found in ast index file HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-03-06 21:24:08,721 - modelscope - INFO - Model revision not specified, use revision: v2.0.2 2024-03-06 21:24:10,537 - modelscope - INFO - initiate model from C:\Users\patho\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2024-03-06 21:24:10,537 - modelscope - INFO - initiate model from location C:\Users\patho\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2024-03-06 21:24:10,544 - modelscope - WARNING - No preprocessor field found in cfg. 2024-03-06 21:24:10,545 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2024-03-06 21:24:10,545 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\\Users\\patho\\.cache\\modelscope\\hub\\damo\\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2024-03-06 21:24:10,545 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2024-03-06 21:24:10,547 - modelscope - INFO - loading model from C:\Users\patho\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt 2024-03-06 21:24:11,075 - modelscope - INFO - load model done HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-03-06 21:24:12,003 - modelscope - INFO - load facefusion models done 2024-03-06 21:24:12,003 - modelscope - INFO - init done 2024-03-06 21:24:12,009 - modelscope - WARNING - No preprocessor field found in cfg. 2024-03-06 21:24:12,009 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2024-03-06 21:24:12,009 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\\Users\\patho\\.cache\\modelscope\\hub\\bubbliiiing\\cv_retinafce_recognition'}. trying to build by task and model information. 2024-03-06 21:24:12,009 - modelscope - WARNING - No preprocessor key ('face_recognition', 'face_recognition') found in PREPROCESSOR_MAP, skip building preprocessor. 2024-03-06 21:24:12,011 - modelscope - INFO - image face recognition model init done 2024-03-06 21:24:12,017 - modelscope - WARNING - task face_recognition input definition is missing C:\Users\patho\.cache\modelscope\modelscope_modules\cv_retinafce_recognition\image_face_recognition\matlab_cp2tform.py:81: FutureWarning:rcondparameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions. To use the future default and silence this warning we advise to passrcond=None, to keep using the old, explicitly passrcond=-1`. r, , , _ = lstsq(X, U) 2024-03-06 21:24:13,332 - modelscope - INFO - model inference done 2024-03-06 21:24:13,332 - modelscope - WARNING - task face_recognition output keys are missing 2024-03-06 21:24:14,183 - modelscope - INFO - model inference done HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-03-06 21:24:14,776 - modelscope - INFO - model inference done 2024-03-06 21:24:15,177 - modelscope - INFO - model inference done 2024-03-06 21:24:15,638 - modelscope - INFO - model inference done 2024-03-06 21:24:16,182 - modelscope - INFO - model inference done 2024-03-06 21:24:16,728 - modelscope - INFO - model inference done HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-03-06 21:24:17,573 - modelscope - INFO - model inference done 2024-03-06 21:24:18,409 - modelscope - INFO - model inference done 2024-03-06 21:24:19,078 - modelscope - INFO - model inference done 2024-03-06 21:24:19,862 - modelscope - INFO - model inference done HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-03-06 21:24:20,563 - modelscope - INFO - model inference done 2024-03-06 21:24:20,854 - modelscope - INFO - model inference done 2024-03-06 21:24:21,430 - modelscope - INFO - model inference done error at: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\imgwarp.cpp:2748: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'cv::warpAffine'

2024-03-06 21:24:22,076 - modelscope - INFO - model inference done 2024-03-06 21:24:22,186 - modelscope - INFO - model inference done 2024-03-06 21:24:22,240 - modelscope - INFO - model inference done 2024-03-06 21:24:22,292 - modelscope - INFO - model inference done 2024-03-06 21:24:22,346 - modelscope - INFO - model inference done 2024-03-06 21:24:22,398 - modelscope - INFO - model inference done 2024-03-06 21:24:22,449 - modelscope - INFO - model inference done 2024-03-06 21:24:22,559 - modelscope - INFO - model inference done 2024-03-06 21:24:22,669 - modelscope - INFO - model inference done 2024-03-06 21:24:22,771 - modelscope - INFO - model inference done 2024-03-06 21:24:22,871 - modelscope - INFO - model inference done 2024-03-06 21:24:22,949 - modelscope - INFO - model inference done 2024-03-06 21:24:23,001 - modelscope - INFO - model inference done 2024-03-06 21:24:23,052 - modelscope - INFO - model inference done error at: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\imgwarp.cpp:2748: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'cv::warpAffine'

Dectect no face in training data, move last weights and validation image to best_outputs Traceback (most recent call last): File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in main() File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper result = func(*args, **kwargs) File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1362, in main copyfile(t_result_list[0][1], os.path.join(best_outputs_dir, os.path.basename(t_result_list[0][1]))) IndexError: list index out of range Steps: 100%|█████████████████████████████████████████████| 800/800 [36:56<00:00, 2.77s/it, lr=5e-5, step_loss=0.00271] HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" Traceback (most recent call last): File "C:\Users\patho\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\patho\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in main() File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main launch_command(args) File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 979, in launch_command simple_launcher(args) File "E:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\Scripts\python.exe', 'E:\AI\stable-diffusion-webui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\Lea_1\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=535413', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\Lea_1\user_weights', '--logging_dir=outputs\easyphoto-user-id-infos\Lea_1\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=Lea_1', '--cache_log_file=E:\AI\stable-diffusion-webui\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1. 2024-03-06 21:24:25,349 - EasyPhoto - Error executing the command: Command '['E:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'E:\AI\stable-diffusion-webui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\Lea_1\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=535413', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\Lea_1\user_weights', '--logging_dir=outputs\easyphoto-user-id-infos\Lea_1\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=Lea_1', '--cache_log_file=E:\AI\stable-diffusion-webui\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1. Applying attention optimization: Doggettx... done.`