aigc-apps / sd-webui-EasyPhoto

📷 EasyPhoto | Your Smart AI Photo Generator.
Apache License 2.0
4.71k stars 369 forks source link

同样的问题,最后生lora的时候报错了,但是翻了前辈的也没找出原因,谢谢帮忙看一下。 #388

Closed Taiilor closed 4 months ago

Taiilor commented 4 months ago

Is there an existing issue for this?

Is EasyPhoto the latest version?

What happened?

同样的问题,最后生lora的时候报错了,但是翻了前辈的也没找出原因

Failed to obtain Lora after training, please check the training process.

上面的诊断书我截取了部分,如果不够全我在补上。

Steps to reproduce the problem

  1. Go to ....
  2. Press ....
  3. ...

What should have happened?

同样的问题,最后生lora的时候报错了,但是翻了前辈的也没找出原因

Failed to obtain Lora after training, please check the training process.

上面的诊断书我截取了部分,如果不够全我在补上。

Commit where the problem happens

webui: 秋叶的1.7.0 EastPhoto: newest

System Information: OS: Microsoft Windows NT 10.0.22621.0 CPU: 16 cores Memory Size: 32768 MB Page File Size: 4136 MB

NVIDIA Management Library: NVIDIA Driver Version: 551.23 NVIDIA Management Library Version: 12.551.23

CUDA Driver: Version: 12040 Devices: 00000000:01:00.0 0: NVIDIA GeForce RTX 3060 Ti [86] 8 GB

NvApi: Version: 55123 r551_06

DirectML Driver: Devices: 9353 0: NVIDIA GeForce RTX 3060 Ti 7 GB

Intel Level Zero Driver: Not Available

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

No

List of enabled extensions

No

Console logs

SD-WebUI Launcher Diagnostic File

Date: 2024-02-08 11:57:12
Launcher Version: 2.7.12.283
Data File Version: 2024-02-01 12:55
SD-WebUI Version: cf2772fab0af5573da775e7437e6acdca424f26e (2023-12-16 14:58:07)
Working Directory: I:\SD
------------------------
System Information: 
OS: Microsoft Windows NT 10.0.22621.0
CPU: 16 cores
Memory Size: 32768 MB
Page File Size: 4136 MB

NVIDIA Management Library:
  NVIDIA Driver Version: 551.23
  NVIDIA Management Library Version: 12.551.23

CUDA Driver:
  Version: 12040
  Devices: 
    00000000:01:00.0 0: NVIDIA GeForce RTX 3060 Ti [86] 8 GB

NvApi:
  Version: 55123 r551_06

DirectML Driver: 
  Devices: 
    9353 0: NVIDIA GeForce RTX 3060 Ti 7 GB

Intel Level Zero Driver:
  Not Available

=====================================
AUTOMATIC1111/stable-diffusion-webui
portable packed by bilibili@秋葉aaaki 
version: v2
本整合包完全免费
=====================================

Civitai Helper: Get Custom Model Folder
Civitai Helper: Load setting from: I:\SD\extensions\Stable-Diffusion-Webui-Civitai-Helper-main\setting.json
Civitai Helper: No setting file, use default
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[
-] ADetailer initialized. version: 
24.1.2, num models: 9
[lora-prompt-tool] Get Custom Model Folder

2024-02-08 11:11:45,505 - modelscope - INFO - PyTorch version 2.1.1+cu121 Found.
2024-02-08 11:11:45,508 - modelscope - INFO - TensorFlow version 2.15.0 Found.
2024-02-08 11:11:45,508 - modelscope - INFO - Loading ast index from I:\SD\.cache\modelscope\hub\ast_indexer
2024-02-08 11:11:46,402 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 dfb467a0103eb81d46590744001ee299 and a total number of 943 components indexed
[AddNet] Updating model hashes...

[AddNet] Updating model hashes...
ControlNet preprocessor location: I:\SD\extensions\sd-webui-controlnet\annotator\downloads
2024-02-08 11:11:51,214 - ControlNet - INFO - ControlNet v1.1.440
2024-02-08 11:11:51,906 - ControlNet - INFO - ControlNet v1.1.440
*** Error loading script: deforum.py
    Traceback (most recent call last):
      File "I:\SD\modules\scripts.py", line 469, in load_scripts
        script_module = script_loading.load_module(scriptfile.path)
      File "I:\SD\modules\script_loading.py", line 10, in load_module
        module_spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 883, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "I:\SD\extensions\sd-webui-deforum-automatic1111-webui\scripts\deforum.py", line 41, in <module>
        from webui import wrap_gradio_gpu_call
    ImportError: cannot import name 'wrap_gradio_gpu_call' from 'webui' (I:\SD\webui.py)

---

2024-02-08 11:14:41,074 - modelscope - INFO - load face enhancer model done
2024-02-08 11:14:42,372 - modelscope - INFO - load face detector model done
2024-02-08 11:14:46,325 - modelscope - INFO - load sr model done
2024-02-08 11:14:49,920 - modelscope - INFO - load fqa model done
2024-02-08 11:15:02,854 - modelscope - WARNING - task skin-retouching-torch input definition is missing
2024-02-08 11:15:25,371 - modelscope - WARNING - task skin-retouching-torch output keys are missing
2024-02-08 11:15:25,384 - modelscope - WARNING - task face_recognition input definition is missing
2024-02-08 11:15:29,527 - modelscope - INFO - model inference done
2024-02-08 11:15:29,528 - modelscope - WARNING - task face_recognition output keys are missing
2024-02-08 11:15:50,472 - modelscope - INFO - model inference done
2024-02-08 11:16:11,365 - modelscope - INFO - model inference done
2024-02-08 11:16:23,704 - modelscope - INFO - model inference done
2024-02-08 11:16:45,032 - modelscope - INFO - model inference done
2024-02-08 11:17:06,633 - modelscope - INFO - model inference done
2024-02-08 11:17:18,245 - modelscope - INFO - model inference done
2024-02-08 11:17:29,334 - modelscope - INFO - model inference done
2024-02-08 11:17:51,389 - modelscope - INFO - model inference done
2024-02-08 11:18:02,438 - modelscope - INFO - model inference done
2024-02-08 11:18:06,628 - modelscope - INFO - model inference done
2024-02-08 11:18:28,605 - modelscope - INFO - model inference done
2024-02-08 11:18:40,427 - modelscope - INFO - model inference done
selected paths:
 I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\10.jpg total scores:  0.7149364408765333 face angles 0.9891111328786306
selected paths:
 I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\11.jpg total scores:  0.7077055862271003 face angles 0.9836397475313537
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\12.jpg total scores:  
0.7052473989614552 face angles 0.9542439665457801
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\2.jpg total scores:  0.6819159344038273 face angles 0.9420341811598977
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\0.jpg total scores:  0.6740659791490394 face angles 0.9690512793465067
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\9.jpg total scores:  0.6483495349873794 face angles 0.9350091658818253
selected paths: 
I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\1.jpg

total scores:  0.6295904211527805 face angles

0.9187095180104281
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\8.jpg total scores:  0.622516780949694 
face angles 0.8646151234542441
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\7.jpg total scores:  0.6095038682337387 face angles 0.8973524500809857
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\6.jpg total scores:  0.5762118542519066 face angles 0.8021406563294543
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\5.jpg total scores:  
0.5620775354746069 face angles 
0.8865819606955003
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\4.jpg total scores:  0.5450684379140865 face angles 0.9751058564978583
selected paths: I:\SD\outputs/easyphoto-user-id-infos\Kele\original_backup\3.jpg total scores:  0.5240435089479749 
face angles

0.9359229722501655
jpg: 12.jpg face_id_scores 0.7052473989614552
jpg: 2.jpg face_id_scores 0.6819159344038273
jpg: 10.jpg face_id_scores 0.7149364408765333
jpg: 8.jpg face_id_scores 0.622516780949694
jpg: 11.jpg face_id_scores 0.7077055862271003
jpg: 6.jpg face_id_scores 0.5762118542519066
jpg: 0.jpg face_id_scores 0.6740659791490394
jpg: 9.jpg face_id_scores 0.6483495349873794
jpg:
 1.jpg

face_id_scores 0.6295904211527805

jpg: 7.jpg face_id_scores 0.6095038682337387
jpg: 5.jpg face_id_scores 0.5620775354746069

jpg: 3.jpg face_id_scores 
0.5240435089479749
jpg: 4.jpg
 face_id_scores 
0.5450684379140865
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\0.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\1.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\2.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\3.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\4.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\5.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\6.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\7.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\8.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\9.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\10.jpg
save processed image to I:\SD\outputs/easyphoto-user-id-infos\Kele\processed_images\train\11.jpg
2024-02-08 11:19:04,072 - EasyPhoto - train_file_path : I:\SD\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py
2024-02-08 11:19:04,073 - EasyPhoto - cache_log_file_path: I:\SD\outputs/easyphoto-tmp/train_kohya_log.txt
The following values were not passed to `accelerate launch` and had defaults used instead:
    `--num_processes` was set to a value of `1`
    `--num_machines` was set to a value of `1`
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2024-02-08 11:19:17,168 - modelscope - INFO - PyTorch version 2.1.1+cu121 Found.
2024-02-08 11:19:17,171 - modelscope - INFO - TensorFlow version 2.15.0 Found.
2024-02-08 11:19:17,171 - modelscope - INFO - Loading ast index from I:\SD\.cache\modelscope\hub\ast_indexer
2024-02-08 11:19:17,265 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 dfb467a0103eb81d46590744001ee299 and a total number of 943 components indexed
02/08/2024 11:19:17 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

{'timestep_spacing', 'sample_max_value', 'thresholding', 'variance_type', 'prediction_type', 'clip_sample_range', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
create LoRA network. base dim (rank): 128, alpha: 64
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
Downloading and preparing dataset imagefolder/default to I:/SD/.cache/huggingface/datasets/imagefolder/default-049383d05d0b6779/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f...
Dataset imagefolder downloaded and prepared to I:/SD/.cache/huggingface/datasets/imagefolder/default-049383d05d0b6779/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f. Subsequent calls will reuse this data.
02/08/2024 11:19:46 - INFO - __main__ - ***** Running training *****
02/08/2024 11:19:46 - INFO - __main__ -   Num examples = 12
02/08/2024 11:19:46 - INFO - __main__ -   Num Epochs = 267
02/08/2024 11:19:46 - INFO - __main__ -   Instantaneous batch size per device = 1
02/08/2024 11:19:46 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
02/08/2024 11:19:46 - INFO - __main__ -   Gradient Accumulation steps = 4
02/08/2024 11:19:46 - INFO - __main__ -   Total optimization steps = 800
2024-02-08 11:19:47,008 - modelscope - INFO - Model revision not specified, use revision: v2.0.2
2024-02-08 11:19:48,625 - modelscope - INFO - initiate model from I:\SD\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface
2024-02-08 11:19:48,625 - modelscope - INFO - initiate model from location I:\SD\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface.
2024-02-08 11:19:48,689 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-08 11:19:48,689 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-08 11:19:48,689 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'I:\\SD\\.cache\\modelscope\\hub\\damo\\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information.
2024-02-08 11:19:48,689 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor
2024-02-08 11:19:48,711 - modelscope - INFO - loading model from I:\SD\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt
2024-02-08 11:19:49,028 - modelscope - INFO - load model done

......

saving checkpoint: outputs\easyphoto-user-id-infos\Kele\user_weights\checkpoint-800.safetensors
02/08/2024 15:08:28 - INFO - __main__ - Saved state to outputs\easyphoto-user-id-infos\Kele\user_weights\checkpoint-800.safetensors, outputs\easyphoto-user-id-infos\Kele\user_weights\checkpoint-800

saving checkpoint: outputs\easyphoto-user-id-infos\Kele\user_weights\pytorch_lora_weights.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint.StableDiffusionInpaintPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
You have loaded a UNet with 4 input channels which.
{'lambda_min_clipped', 'use_karras_sigmas', 'prediction_type', 'algorithm_type', 'thresholding', 'lower_order_final', 'sample_max_value', 'euler_at_final', 'use_lu_lambdas', 'timestep_spacing', 'solver_order', 'variance_type', 'solver_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
02/08/2024 15:08:35 - INFO - __main__ - Running validation error, skip it.Error info: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 6.44 GiB
Requested               : 1.13 GiB
Device limit            : 8.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB.
2024-02-08 15:08:36,243 - modelscope - INFO - Use user-specified model revision: v1.0.3
2024-02-08 15:08:36,669 - modelscope - WARNING - ('PIPELINES', 'face_recognition', 'face_recognition') not found in ast index file
2024-02-08 15:08:36,669 - modelscope - INFO - initiate model from I:\SD\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition
2024-02-08 15:08:36,669 - modelscope - INFO - initiate model from location I:\SD\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition.
2024-02-08 15:08:36,691 - modelscope - INFO - initialize model from I:\SD\.cache\modelscope\hub\bubbliiiing\cv_retinafce_recognition
2024-02-08 15:08:36,791 - modelscope - WARNING - ('MODELS', 'face_recognition', 'face_recognition') not found in ast index file
2024-02-08 15:08:37,350 - modelscope - INFO - Model revision not specified, use revision: v2.0.2
2024-02-08 15:08:38,957 - modelscope - INFO - initiate model from I:\SD\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface
2024-02-08 15:08:38,957 - modelscope - INFO - initiate model from location I:\SD\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface.
2024-02-08 15:08:39,021 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-08 15:08:39,021 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-08 15:08:39,021 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'I:\\SD\\.cache\\modelscope\\hub\\damo\\cv_resnet50_face-detection_retinaface'}. trying to build by task and model information.
2024-02-08 15:08:39,021 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor
2024-02-08 15:08:39,043 - modelscope - INFO - loading model from I:\SD\.cache\modelscope\hub\damo\cv_resnet50_face-detection_retinaface\pytorch_model.pt
2024-02-08 15:08:39,420 - modelscope - INFO - load model done
2024-02-08 15:08:40,137 - modelscope - INFO - load facefusion models done
2024-02-08 15:08:40,137 - modelscope - INFO - init done
2024-02-08 15:08:40,199 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-08 15:08:40,200 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-08 15:08:40,200 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'I:\\SD\\.cache\\modelscope\\hub\\bubbliiiing\\cv_retinafce_recognition'}. trying to build by task and model information.
2024-02-08 15:08:40,200 - modelscope - WARNING - No preprocessor key ('face_recognition', 'face_recognition') found in PREPROCESSOR_MAP, skip building preprocessor.
2024-02-08 15:08:40,221 - modelscope - INFO - image face recognition model init done
2024-02-08 15:08:40,225 - modelscope - WARNING - task face_recognition input definition is missing
2024-02-08 15:08:50,015 - modelscope - INFO - model inference done
2024-02-08 15:08:50,017 - modelscope - WARNING - task face_recognition output keys are missing
2024-02-08 15:08:57,292 - modelscope - INFO - model inference done
2024-02-08 15:09:04,538 - modelscope - INFO - model inference done
2024-02-08 15:09:11,856 - modelscope - INFO - model inference done
2024-02-08 15:09:19,051 - modelscope - INFO - model inference done
2024-02-08 15:09:26,226 - modelscope - INFO - model inference done
2024-02-08 15:09:33,509 - modelscope - INFO - model inference done
2024-02-08 15:09:33,668 - modelscope - INFO - model inference done
2024-02-08 15:09:40,948 - modelscope - INFO - model inference done
2024-02-08 15:09:47,892 - modelscope - INFO - model inference done
2024-02-08 15:09:53,646 - modelscope - INFO - model inference done
2024-02-08 15:10:00,752 - modelscope - INFO - model inference done
2024-02-08 15:10:00,895 - modelscope - INFO - model inference done
2024-02-08 15:10:01,051 - modelscope - INFO - model inference done
2024-02-08 15:10:01,195 - modelscope - INFO - model inference done
2024-02-08 15:10:01,355 - modelscope - INFO - model inference done
2024-02-08 15:10:01,492 - modelscope - INFO - model inference done
2024-02-08 15:10:01,632 - modelscope - INFO - model inference done
2024-02-08 15:10:01,789 - modelscope - INFO - model inference done
2024-02-08 15:10:01,906 - modelscope - INFO - model inference done
2024-02-08 15:10:02,027 - modelscope - INFO - model inference done
2024-02-08 15:10:02,110 - modelscope - INFO - model inference done
2024-02-08 15:10:02,230 - modelscope - INFO - model inference done
2024-02-08 15:10:02,316 - modelscope - INFO - model inference done
Dectect no face in training data, move last weights and validation image to best_outputs
Traceback (most recent call last):
  File "I:\SD\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in <module>
    main()
  File "I:\SD\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper
    result = func(*args, **kwargs)
  File "I:\SD\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1362, in main
    copyfile(t_result_list[0][1], os.path.join(best_outputs_dir, os.path.basename(t_result_list[0][1])))
IndexError
: list index out of range
Traceback (most recent call last):
  File "runpy.py", line 196, in _run_module_as_main
  File "runpy.py", line 86, in _run_code
  File "I:\SD\py310\lib\site-packages\accelerate\commands\launch.py", line 989, in <module>
    main()
  File "I:\SD\py310\lib\site-packages\accelerate\commands\launch.py", line 985, in main
    launch_command(args)
  File "I:\SD\py310\lib\site-packages\accelerate\commands\launch.py", line 979, in launch_command
    simple_launcher(args)
  File "I:\SD\py310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess
.CalledProcessError
: 
Command '['I:\\SD\\py310\\python.exe', 'I:\\SD\\extensions\\sd-webui-EasyPhoto\\scripts\\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\\sd-webui-EasyPhoto\\models\\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\\Stable-diffusion\\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\\easyphoto-user-id-infos\\Kele\\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=768624', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\\easyphoto-user-id-infos\\Kele\\user_weights', '--logging_dir=outputs\\easyphoto-user-id-infos\\Kele\\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\\sd-webui-EasyPhoto\\models\\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=Kele', '--cache_log_file=I:\\SD\\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.
2024-02-08 15:10:06,015 - EasyPhoto - Error executing the command: Command '['I:\\SD\\py310\\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'I:\\SD\\extensions\\sd-webui-EasyPhoto\\scripts\\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\\sd-webui-EasyPhoto\\models\\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\\Stable-diffusion\\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\\easyphoto-user-id-infos\\Kele\\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=768624', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\\easyphoto-user-id-infos\\Kele\\user_weights', '--logging_dir=outputs\\easyphoto-user-id-infos\\Kele\\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\\sd-webui-EasyPhoto\\models\\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=Kele', '--cache_log_file=I:\\SD\\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.
Applying attention optimization: xformers... 
done.
------------------------
Fault Traceback: 
Not Available

Additional information

No

flt6 commented 4 months ago

请检查stable-diffusion-webui/outputs/easyphoto-user-id-infos/<对应名称>/user_weights/best_outputs中是否存在文件。 如果没有文件那么我遇到了相同问题,经过小规模Lora训练测试观察到

INFO - __main__ - Running validation error, skip it.Error info: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 9.77 GiB total capacity; 5.47 GiB already allocated; 50.69 MiB free; 5.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

问题原因是在Lora训练的过程中vRAM不足,那么一个解决方案是在完成训练后额外对于每一个save steps的safetensors进行图像生成

flt6 commented 4 months ago

另一种缓解方法是训练中的去除Validation

wuziheng commented 4 months ago

似乎是vram 不足的问题,请关闭validation选项后,再尝试一下。如果还是有问题,欢迎到有免费试用计划的云平台去试用更大的vram机器和预设的环境跑通流程。

flt6 commented 4 months ago

关闭后正常运行

Taiilor commented 4 months ago

谢谢,关闭Validation后可以了。