事实上，我只是个不懂代码的普通用户，并不知道是从那一步开始出错的，

然后我想起来，之前出过一个onnx的错误，我按照上次的issue操作了一下，现在是这样

Loading VAE weights specified in settings: D:\sdwebui\stable-diffusion-webui\models\VAE\vae-ft-mse-840000-ema-pruned.safetensors Applying attention optimization: xformers... done. Model loaded in 9.3s (load weights from disk: 1.6s, create model: 1.1s, apply weights to model: 4.2s, apply dtype to VAE: 1.4s, load VAE: 0.4s, calculate empty prompt: 0.4s). 2023-11-01 11:30:51,578 - scripts - D:\sdwebui\stable-diffusion-webui\models\Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors : Hash match 2023-11-01 11:30:52,116 - scripts - D:\sdwebui\stable-diffusion-webui\models\Stable-diffusion/SDXL_1.0_ArienMixXL_v2.0.safetensors : Hash match 2023-11-01 11:30:52,645 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_v11p_sd15_openpose.pth : Hash match 2023-11-01 11:30:53,193 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_v11p_sd15_canny.pth : Hash match 2023-11-01 11:30:53,711 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_v11f1e_sd15_tile.pth : Hash match 2023-11-01 11:30:54,224 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_sd15_random_color.pth : Hash match 2023-11-01 11:30:54,708 - scripts - D:\sdwebui\stable-diffusion-webui\models\Lora/FilmVelvia3.safetensors : Hash match 2023-11-01 11:30:55,187 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator/downloads/openpose\body_pose_model.pth : Hash match 2023-11-01 11:30:55,700 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator/downloads/openpose\facenet.pth : Hash match 2023-11-01 11:30:56,223 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator/downloads/openpose\hand_pose_model.pth : Hash match 2023-11-01 11:30:56,750 - scripts - D:\sdwebui\stable-diffusion-webui\models\VAE/vae-ft-mse-840000-ema-pruned.ckpt : Hash match 2023-11-01 11:30:57,254 - scripts - D:\sdwebui\stable-diffusion-webui\models\VAE/madebyollin-sdxl-vae-fp16-fix.safetensors : Hash match 2023-11-01 11:30:57,759 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\face_skin.pth : Hash match 2023-11-01 11:30:58,251 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\face_landmarks.pth : Hash match 2023-11-01 11:30:58,810 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\makeup_transfer.pth : Hash match 2023-11-01 11:30:59,283 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\1.jpg : Hash match 2023-11-01 11:30:59,767 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\2.jpg : Hash match 2023-11-01 11:31:00,293 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\3.jpg : Hash match 2023-11-01 11:31:00,792 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\4.jpg : Hash match 2023-11-01 11:31:04,551 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-01 11:31:04,554 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-01 11:31:04,555 - modelscope - INFO - Loading ast index from C:\Users\fei19.cache\modelscope\ast_indexer 2023-11-01 11:31:04,652 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 61a2c731868309df775a218c8b8addc6 and a total number of 943 components indexed 2023-11-01 11:31:06,438 - modelscope - INFO - Use user-specified model revision: v1.0.3 2023-11-01 11:31:06,719 - modelscope - WARNING - ('PIPELINES', 'face_recognition', 'face_recognition') not found in ast index file 2023-11-01 11:31:06,720 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2023-11-01 11:31:06,720 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition. 2023-11-01 11:31:06,724 - modelscope - INFO - initialize model from C:\Users\fei19.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2023-11-01 11:31:06,730 - modelscope - WARNING - ('MODELS', 'face_recognition', 'face_recognition') not found in ast index file 2023-11-01 11:31:07,652 - modelscope - INFO - Model revision not specified, use revision: v2.0.2 2023-11-01 11:31:09,301 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-11-01 11:31:09,301 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-11-01 11:31:09,311 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 11:31:09,313 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 11:31:09,313 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-11-01 11:31:09,313 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 11:31:09,316 - modelscope - INFO - loading model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=None. warnings.warn(msg) 2023-11-01 11:31:10,033 - modelscope - INFO - load model done 2023-11-01 11:31:11,291 - modelscope - INFO - load facefusion models done 2023-11-01 11:31:11,291 - modelscope - INFO - init done 2023-11-01 11:31:11,298 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 11:31:11,298 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 11:31:11,298 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition'}. trying to build by task and model information. 2023-11-01 11:31:11,298 - modelscope - WARNING - No preprocessor key ('face_recognition', 'face_recognition') found in PREPROCESSOR_MAP, skip building preprocessor. 2023-11-01 11:31:11,301 - modelscope - INFO - image face recognition model init done 2023-11-01 11:31:11,913 - modelscope - INFO - Use user-specified model revision: v2.0.2 2023-11-01 11:31:13,650 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-11-01 11:31:13,650 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-11-01 11:31:13,662 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 11:31:13,662 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 11:31:13,662 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-11-01 11:31:13,663 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 11:31:13,667 - modelscope - INFO - loading model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt 2023-11-01 11:31:14,211 - modelscope - INFO - load model done 2023-11-01 11:31:14,807 - modelscope - INFO - Use user-specified model revision: v1.0.0 2023-11-01 11:31:15,025 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_u2net_salient-detection 2023-11-01 11:31:15,026 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_u2net_salient-detection. 2023-11-01 11:31:15,029 - modelscope - INFO - initialize model from C:\Users\fei19.cache\modelscope\hub\damo\cv_u2net_salient-detection 2023-11-01 11:31:15,508 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 11:31:15,508 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 11:31:15,508 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_u2net_salient-detection'}. trying to build by task and model information. 2023-11-01 11:31:15,509 - modelscope - WARNING - No preprocessor key ('detection', 'semantic-segmentation') found in PREPROCESSOR_MAP, skip building preprocessor. 2023-11-01 11:31:16,213 - modelscope - INFO - Use user-specified model revision: v1.0.2 2023-11-01 11:31:16,600 - modelscope - WARNING - ('PIPELINES', 'skin-retouching-torch', 'skin-retouching-torch') not found in ast index file 2023-11-01 11:31:16,600 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_unet_skin_retouching_torch 2023-11-01 11:31:16,601 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_unet_skin_retouching_torch. 2023-11-01 11:31:16,609 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 11:31:16,610 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 11:31:16,610 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_unet_skin_retouching_torch'}. trying to build by task and model information. 2023-11-01 11:31:16,610 - modelscope - WARNING - Find task: skin-retouching-torch, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 11:31:17,618 - modelscope - INFO - Model revision not specified, use revision: v2.0.2 2023-11-01 11:31:19,379 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-11-01 11:31:19,380 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-11-01 11:31:19,390 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 11:31:19,391 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 11:31:19,391 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-11-01 11:31:19,391 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 11:31:19,395 - modelscope - INFO - loading model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt 2023-11-01 11:31:19,941 - modelscope - INFO - load model done 2023-11-01 11:31:22.1191810 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2649'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1248995 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2644'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1305160 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2647'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1362117 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2658'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1416011 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2648'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1471927 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2657'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1529290 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2653'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1583673 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2652'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1640966 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2645'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1696210 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2643'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1749272 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2641'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1804806 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2633'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1861589 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2632'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1913039 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2624'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.1967345 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2614'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.2024383 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2613'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.2075383 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2606'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.2129172 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2598'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.2187047 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2596'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:22.2237871 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2594'. It is not used by any node and should be removed from the model. 2023-11-01 11:31:23,129 - modelscope - INFO - Use user-specified model revision: v1.0.0 2023-11-01 11:31:23,400 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_gpen_image-portrait-enhancement 2023-11-01 11:31:23,400 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_gpen_image-portrait-enhancement. 2023-11-01 11:31:23,403 - modelscope - INFO - initialize model from C:\Users\fei19.cache\modelscope\hub\damo\cv_gpen_image-portrait-enhancement Loading ResNet ArcFace 2023-11-01 11:31:26,529 - modelscope - INFO - load face enhancer model done 2023-11-01 11:31:27,069 - modelscope - INFO - load face detector model done 2023-11-01 11:31:27,575 - modelscope - INFO - load sr model done 2023-11-01 11:31:29,025 - modelscope - INFO - load fqa model done 0%| | 0/6 [00:00<?, ?it/s]2023-11-01 11:31:56,327 - modelscope - WARNING - task skin-retouching-torch input definition is missing 2023-11-01 11:33:14,486 - modelscope - WARNING - task skin-retouching-torch output keys are missing 2023-11-01 11:33:14,495 - modelscope - WARNING - task face_recognition input definition is missing 2023-11-01 11:33:14,863 - modelscope - INFO - model inference done 2023-11-01 11:33:14,863 - modelscope - WARNING - task face_recognition output keys are missing 17%|█████████████▊ | 1/6 [01:45<08:48, 105.77s/it]2023-11-01 11:33:54,236 - modelscope - INFO - model inference done 33%|████████████████████████████ | 2/6 [02:25<04:26, 66.71s/it]2023-11-01 11:34:16,990 - modelscope - INFO - model inference done 50%|██████████████████████████████████████████ | 3/6 [02:47<02:19, 46.64s/it]2023-11-01 11:34:46,426 - modelscope - INFO - model inference done 67%|████████████████████████████████████████████████████████ | 4/6 [03:17<01:19, 39.85s/it]2023-11-01 11:35:04,179 - modelscope - INFO - model inference done 83%|██████████████████████████████████████████████████████████████████████ | 5/6 [03:35<00:31, 31.88s/it]2023-11-01 11:35:20,405 - modelscope - INFO - model inference done 100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [03:51<00:00, 38.55s/it] selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\3.jpg total scores: 0.7334901360867766 face angles 0.9897076867135809 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\2.jpg total scores: 0.7209505957542396 face angles 0.9699232171534021 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\4.jpg total scores: 0.6976329095818775 face angles 0.9257631977173943 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\5.jpg total scores: 0.674010219264496 face angles 0.9577307464706645 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\1.jpg total scores: 0.6583246713145142 face angles 0.927000829634453 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\0.jpg total scores: 0.597732954503883 face angles 0.8906064025354908 jpg: 4.jpg face_id_scores 0.6976329095818775 jpg: 2.jpg face_id_scores 0.7209505957542396 jpg: 3.jpg face_id_scores 0.7334901360867766 jpg: 1.jpg face_id_scores 0.6583246713145142 jpg: 5.jpg face_id_scores 0.674010219264496 jpg: 0.jpg face_id_scores 0.597732954503883 6it [00:13, 2.23s/it] save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\0.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\1.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\2.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\3.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\4.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\5.jpg train_file_path : D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py cache_log_file_path: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1 --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2023-11-01 11:35:54,508 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-01 11:35:54,512 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-01 11:35:54,512 - modelscope - INFO - Loading ast index from C:\Users\fei19.cache\modelscope\ast_indexer 2023-11-01 11:35:54,524 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-01 11:35:54,528 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-01 11:35:54,528 - modelscope - INFO - Loading ast index from C:\Users\fei19.cache\modelscope\ast_indexer 2023-11-01 11:35:54,614 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 61a2c731868309df775a218c8b8addc6 and a total number of 943 components indexed 2023-11-01 11:35:54,626 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 61a2c731868309df775a218c8b8addc6 and a total number of 943 components indexed [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). Traceback (most recent call last): Traceback (most recent call last): File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1467, in File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1467, in main() main() File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 178, in wrapper

File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 178, in wrapper result = func(*args, *kwargs) File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 826, in main result = func(args, kwargs) File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 826, in main accelerator = Accelerator( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init accelerator = Accelerator( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init self.state = AcceleratorState( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 720, in init self.state = AcceleratorState( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 720, in init PartialState(cpu, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 192, in init PartialState(cpu, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 192, in init torch.distributed.init_process_group(backend=self.backend, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group torch.distributed.init_process_group(backend=self.backend, **kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group default_pg = _new_process_group_helper( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper default_pg = _new_process_group_helper( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built inraise RuntimeError("Distributed package doesn't have NCCL " "built in")

RuntimeError: Distributed package doesn't have NCCL built in ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5752) of binary: D:\sdwebui\stable-diffusion-webui\venv\Scripts\python.exe Traceback (most recent call last): File "C:\Users\fei19\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\fei19\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in main() File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main launch_command(args) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 970, in launch_command multi_gpu_launcher(args) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher distrib_run.run(args) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\run.py", line 785, in run elastic_launch( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py FAILED

Failures: [1]: time : 2023-11-01_11:36:00 host : fei-station-sz rank : 1 (local_rank: 1) exitcode : 1 (pid: 13192) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-11-01_11:36:00 host : fei-station-sz rank : 0 (local_rank: 0) exitcode : 1 (pid: 5752) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Error executing the command: Command '['D:\sdwebui\stable-diffusion-webui\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_mode el_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\ms_xiaoshhenyang\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_ssteps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=consstant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face,, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\ms_xiaoshenyang\user_weights', '--logging__dir=outputs\easyphoto-user-id-infos\ms_xiaoshenyang\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fpp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '---merge_best_lora_name=ms_xiaoshenyang', '--cache_log_file=D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', ''--validation']' returned non-zero exit status 1.

accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1

你这里换机器有多GPU，所以accelerate默认去多卡了，但你实际多GPU又没有通讯，就挂了，一个可能的解决方案

linux 机器，启动前 CUDA_VISIBLE_DEVICES=0 或者=1 取决于你的显卡
windows 同理，需要查一下

accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1

你这里换机器有多GPU，所以accelerate默认去多卡了，但你实际多GPU又没有通讯，就挂了，一个可能的解决方案

linux 机器，启动前 CUDA_VISIBLE_DEVICES=0 或者=1 取决于你的显卡

windows 同理，需要查一下

前面我没说明白，换的是主板和cpu，显卡还是原来那两个，之前用这个插件也一直是这两个显卡，都没问题的 bf05419f279993e816e4446f96e662d 6cc7ab8a1036ff0095efce2fd22c7fe

我这样操作的，不知道对不对，问题还是依旧

2023-11-01 14:10:37,491 - scripts - D:\sdwebui\stable-diffusion-webui\models\Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors : Hash match 2023-11-01 14:10:38,015 - scripts - D:\sdwebui\stable-diffusion-webui\models\Stable-diffusion/SDXL_1.0_ArienMixXL_v2.0.safetensors : Hash match 2023-11-01 14:10:38,538 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_v11p_sd15_openpose.pth : Hash match 2023-11-01 14:10:39,073 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_v11p_sd15_canny.pth : Hash match 2023-11-01 14:10:39,616 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_v11f1e_sd15_tile.pth : Hash match 2023-11-01 14:10:40,123 - scripts - D:\sdwebui\stable-diffusion-webui\models\ControlNet/control_sd15_random_color.pth : Hash match 2023-11-01 14:10:40,631 - scripts - D:\sdwebui\stable-diffusion-webui\models\Lora/FilmVelvia3.safetensors : Hash match 2023-11-01 14:10:41,155 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator/downloads/openpose\body_pose_model.pth : Hash match 2023-11-01 14:10:41,657 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator/downloads/openpose\facenet.pth : Hash match 2023-11-01 14:10:42,150 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator/downloads/openpose\hand_pose_model.pth : Hash match 2023-11-01 14:10:42,651 - scripts - D:\sdwebui\stable-diffusion-webui\models\VAE/vae-ft-mse-840000-ema-pruned.ckpt : Hash match 2023-11-01 14:10:43,158 - scripts - D:\sdwebui\stable-diffusion-webui\models\VAE/madebyollin-sdxl-vae-fp16-fix.safetensors : Hash match 2023-11-01 14:10:43,649 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\face_skin.pth : Hash match 2023-11-01 14:10:44,152 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\face_landmarks.pth : Hash match 2023-11-01 14:10:44,662 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\makeup_transfer.pth : Hash match 2023-11-01 14:10:45,169 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\1.jpg : Hash match 2023-11-01 14:10:45,689 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\2.jpg : Hash match 2023-11-01 14:10:46,178 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\3.jpg : Hash match 2023-11-01 14:10:46,639 - scripts - D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\models\training_templates\4.jpg : Hash match 2023-11-01 14:10:49,922 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-01 14:10:49,926 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-01 14:10:49,926 - modelscope - INFO - Loading ast index from C:\Users\fei19.cache\modelscope\ast_indexer 2023-11-01 14:10:50,024 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 61a2c731868309df775a218c8b8addc6 and a total number of 943 components indexed 2023-11-01 14:10:51,562 - modelscope - INFO - Use user-specified model revision: v1.0.3 2023-11-01 14:10:51,798 - modelscope - WARNING - ('PIPELINES', 'face_recognition', 'face_recognition') not found in ast index file 2023-11-01 14:10:51,798 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2023-11-01 14:10:51,798 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition. 2023-11-01 14:10:51,801 - modelscope - INFO - initialize model from C:\Users\fei19.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition 2023-11-01 14:10:51,805 - modelscope - WARNING - ('MODELS', 'face_recognition', 'face_recognition') not found in ast index file 2023-11-01 14:10:52,636 - modelscope - INFO - Model revision not specified, use revision: v2.0.2 2023-11-01 14:10:54,344 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-11-01 14:10:54,344 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-11-01 14:10:54,353 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 14:10:54,354 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 14:10:54,354 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-11-01 14:10:54,354 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 14:10:54,357 - modelscope - INFO - loading model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`. warnings.warn(msg) 2023-11-01 14:10:55,143 - modelscope - INFO - load model done 2023-11-01 14:10:56,454 - modelscope - INFO - load facefusion models done 2023-11-01 14:10:56,454 - modelscope - INFO - init done 2023-11-01 14:10:56,462 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 14:10:56,462 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 14:10:56,462 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition'}. trying to build by task and model information. 2023-11-01 14:10:56,463 - modelscope - WARNING - No preprocessor key ('face_recognition', 'face_recognition') found in PREPROCESSOR_MAP, skip building preprocessor. 2023-11-01 14:10:56,465 - modelscope - INFO - image face recognition model init done 2023-11-01 14:10:57,079 - modelscope - INFO - Use user-specified model revision: v2.0.2 2023-11-01 14:10:58,814 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-11-01 14:10:58,814 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-11-01 14:10:58,823 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 14:10:58,823 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 14:10:58,823 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-11-01 14:10:58,823 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 14:10:58,826 - modelscope - INFO - loading model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt 2023-11-01 14:10:59,374 - modelscope - INFO - load model done 2023-11-01 14:11:00,022 - modelscope - INFO - Use user-specified model revision: v1.0.0 2023-11-01 14:11:00,247 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_u2net_salient-detection 2023-11-01 14:11:00,247 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_u2net_salient-detection. 2023-11-01 14:11:00,250 - modelscope - INFO - initialize model from C:\Users\fei19.cache\modelscope\hub\damo\cv_u2net_salient-detection 2023-11-01 14:11:00,740 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 14:11:00,740 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 14:11:00,740 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_u2net_salient-detection'}. trying to build by task and model information. 2023-11-01 14:11:00,740 - modelscope - WARNING - No preprocessor key ('detection', 'semantic-segmentation') found in PREPROCESSOR_MAP, skip building preprocessor. 2023-11-01 14:11:01,422 - modelscope - INFO - Use user-specified model revision: v1.0.2 2023-11-01 14:11:01,694 - modelscope - WARNING - ('PIPELINES', 'skin-retouching-torch', 'skin-retouching-torch') not found in ast index file 2023-11-01 14:11:01,695 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_unet_skin_retouching_torch 2023-11-01 14:11:01,695 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_unet_skin_retouching_torch. 2023-11-01 14:11:01,702 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 14:11:01,702 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 14:11:01,702 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_unet_skin_retouching_torch'}. trying to build by task and model information. 2023-11-01 14:11:01,702 - modelscope - WARNING - Find task: skin-retouching-torch, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 14:11:02,574 - modelscope - INFO - Model revision not specified, use revision: v2.0.2 2023-11-01 14:11:04,592 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface 2023-11-01 14:11:04,592 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface. 2023-11-01 14:11:04,599 - modelscope - WARNING - No preprocessor field found in cfg. 2023-11-01 14:11:04,599 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file. 2023-11-01 14:11:04,599 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\fei19\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information. 2023-11-01 14:11:04,599 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor 2023-11-01 14:11:04,602 - modelscope - INFO - loading model from C:\Users\fei19.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt 2023-11-01 14:11:05,145 - modelscope - INFO - load model done 2023-11-01 14:11:07.3159731 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2649'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3221060 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2644'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3279565 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2647'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3338945 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2658'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3398214 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2648'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3456843 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2657'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3516248 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2653'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3576048 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2652'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3633890 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2645'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3693160 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2643'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3752185 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2641'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3806255 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2633'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3865699 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2632'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3925051 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2624'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.3982754 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2614'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.4042655 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2613'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.4098876 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2606'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.4153843 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2598'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.4214282 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2596'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:07.4268487 [W:onnxruntime:, graph.cc:3553 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt2594'. It is not used by any node and should be removed from the model. 2023-11-01 14:11:08,400 - modelscope - INFO - Use user-specified model revision: v1.0.0 2023-11-01 14:11:08,737 - modelscope - INFO - initiate model from C:\Users\fei19.cache\modelscope\hub\damo\cv_gpen_image-portrait-enhancement 2023-11-01 14:11:08,738 - modelscope - INFO - initiate model from location C:\Users\fei19.cache\modelscope\hub\damo\cv_gpen_image-portrait-enhancement. 2023-11-01 14:11:08,740 - modelscope - INFO - initialize model from C:\Users\fei19.cache\modelscope\hub\damo\cv_gpen_image-portrait-enhancement Loading ResNet ArcFace 2023-11-01 14:11:12,314 - modelscope - INFO - load face enhancer model done 2023-11-01 14:11:12,911 - modelscope - INFO - load face detector model done 2023-11-01 14:11:13,434 - modelscope - INFO - load sr model done 2023-11-01 14:11:14,919 - modelscope - INFO - load fqa model done 0%| | 0/6 [00:00<?, ?it/s]2023-11-01 14:11:31,318 - modelscope - WARNING - task skin-retouching-torch input definition is missing 2023-11-01 14:12:05,422 - modelscope - WARNING - task skin-retouching-torch output keys are missing 2023-11-01 14:12:05,432 - modelscope - WARNING - task face_recognition input definition is missing 2023-11-01 14:12:05,803 - modelscope - INFO - model inference done 2023-11-01 14:12:05,804 - modelscope - WARNING - task face_recognition output keys are missing 17%|██████████████ | 1/6 [00:50<04:14, 50.81s/it]2023-11-01 14:12:37,346 - modelscope - INFO - model inference done 33%|████████████████████████████ | 2/6 [01:22<02:37, 39.48s/it]2023-11-01 14:12:46,398 - modelscope - INFO - model inference done 50%|██████████████████████████████████████████ | 3/6 [01:31<01:16, 25.58s/it]2023-11-01 14:12:58,038 - modelscope - INFO - model inference done 67%|████████████████████████████████████████████████████████ | 4/6 [01:43<00:40, 20.08s/it]2023-11-01 14:13:10,516 - modelscope - INFO - model inference done 83%|██████████████████████████████████████████████████████████████████████ | 5/6 [01:55<00:17, 17.34s/it]2023-11-01 14:13:22,184 - modelscope - INFO - model inference done 100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [02:07<00:00, 21.20s/it] selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\3.jpg total scores: 0.7334901360867766 face angles 0.9897076867135809 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\2.jpg total scores: 0.7209505957542396 face angles 0.9699232171534021 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\4.jpg total scores: 0.6976329095818775 face angles 0.9257631977173943 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\5.jpg total scores: 0.674010219264496 face angles 0.9577307464706645 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\1.jpg total scores: 0.6583246713145142 face angles 0.927000829634453 selected paths: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\original_backup\0.jpg total scores: 0.597732954503883 face angles 0.8906064025354908 jpg: 4.jpg face_id_scores 0.6976329095818775 jpg: 2.jpg face_id_scores 0.7209505957542396 jpg: 3.jpg face_id_scores 0.7334901360867766 jpg: 1.jpg face_id_scores 0.6583246713145142 jpg: 5.jpg face_id_scores 0.674010219264496 jpg: 0.jpg face_id_scores 0.597732954503883 6it [00:09, 1.60s/it] save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\0.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\1.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\2.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\3.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\4.jpg save processed image to D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-user-id-infos\ms_xiaoshenyang\processed_images\train\5.jpg train_file_path : D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py cache_log_file_path: D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `2` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2023-11-01 14:13:50,742 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-01 14:13:50,744 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2023-11-01 14:13:50,746 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-01 14:13:50,746 - modelscope - INFO - Loading ast index from C:\Users\fei19.cache\modelscope\ast_indexer 2023-11-01 14:13:50,748 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2023-11-01 14:13:50,748 - modelscope - INFO - Loading ast index from C:\Users\fei19.cache\modelscope\ast_indexer 2023-11-01 14:13:50,858 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 61a2c731868309df775a218c8b8addc6 and a total number of 943 components indexed 2023-11-01 14:13:50,858 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 61a2c731868309df775a218c8b8addc6 and a total number of 943 components indexed [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). Traceback (most recent call last): File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1467, in main() File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 178, in wrapper result = func(args, kwargs) File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 826, in main accelerator = Accelerator( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init self.state = AcceleratorState( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 720, in init PartialState(cpu, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 192, in init torch.distributed.init_process_group(backend=self.backend, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). default_pg = _new_process_group_helper( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [fei-station-sz]:3456 (system error: 10049 - 在其上下文中，该请求的地址无效。). : Distributed package doesn't have NCCL built in Traceback (most recent call last): File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1467, in main() File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 178, in wrapper result = func(args, kwargs) File "D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 826, in main accelerator = Accelerator( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init self.state = AcceleratorState( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 720, in init PartialState(cpu, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\state.py", line 192, in init torch.distributed.init_process_group(backend=self.backend, kwargs) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group default_pg = _new_process_group_helper( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 18260) of binary: D:\sdwebui\stable-diffusion-webui\venv\Scripts\python.exe Traceback (most recent call last): File "C:\Users\fei19\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\fei19\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in main() File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main launch_command(args) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 970, in launch_command multi_gpu_launcher(args) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher distrib_run.run(args) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\run.py", line 785, in run elastic_launch( File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call** return launch_agent(self._config, self._entrypoint, list(args)) File "D:\sdwebui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py FAILED

Failures: [1]: time : 2023-11-01_14:13:58 host : fei-station-sz rank : 1 (local_rank: 1) exitcode : 1 (pid: 20820) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-11-01_14:13:58 host : fei-station-sz rank : 0 (local_rank: 0) exitcode : 1 (pid: 18260) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Error executing the command: Command '['D:\sdwebui\stable-diffusion-webui\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\ms_xiaoshenyang\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\ms_xiaoshenyang\user_weights', '--logging_dir=outputs\easyphoto-user-id-infos\ms_xiaoshenyang\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=ms_xiaoshenyang', '--cache_log_file=D:\sdwebui\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.

accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1

你这里换机器有多GPU，所以accelerate默认去多卡了，但你实际多GPU又没有通讯，就挂了，一个可能的解决方案

linux 机器，启动前 CUDA_VISIBLE_DEVICES=0 或者=1 取决于你的显卡

windows 同理，需要查一下

您好，Linux机器cpu机器这个地方也是这么配置吗？

aigc-apps / sd-webui-EasyPhoto

Error executing the command: #225

D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py FAILED

Failures: [1]: time : 2023-11-01_11:36:00 host : fei-station-sz rank : 1 (local_rank: 1) exitcode : 1 (pid: 13192) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-11-01_11:36:00 host : fei-station-sz rank : 0 (local_rank: 0) exitcode : 1 (pid: 5752) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

D:\sdwebui\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py FAILED

Failures: [1]: time : 2023-11-01_14:13:58 host : fei-station-sz rank : 1 (local_rank: 1) exitcode : 1 (pid: 20820) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-11-01_14:13:58 host : fei-station-sz rank : 0 (local_rank: 0) exitcode : 1 (pid: 18260) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html