训练报错，单卡2080ti，报错内容如下

pdshwc commented 1 year ago

cache_log_file_path: O:\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt Traceback (most recent call last): File "C:\ProgramData\anaconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\anaconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "O:\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in main() File "O:\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main launch_command(args) File "O:\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 957, in launch_command args, defaults, mp_from_config_flag = _validate_launch_command(args) File "O:\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\launch.py", line 821, in _validate_launch_command defaults = load_config_from_file(args.config_file) File "O:\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\config\config_args.py", line 72, in load_config_from_file return config_class.from_yaml_file(yaml_file=config_file) File "O:\stable-diffusion-webui\venv\lib\site-packages\accelerate\commands\config\config_args.py", line 135, in from_yaml_file return cls(**config_dict) TypeError: ClusterConfig.init() got an unexpected keyword argument 'debug' Error executing the command: Command '['O:\stable-diffusion-webui\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'O:\stable-diffusion-webui\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-user-id-infos\chenxue\processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=800', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=outputs\easyphoto-user-id-infos\chenxue\user_weights', '--logging_dir=outputs\easyphoto-user-id-infos\chenxue\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=extensions\sd-webui-EasyPhoto\models\training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=chenxue', '--cache_log_file=O:\stable-diffusion-webui\outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1

pdshwc commented 1 year ago

sd-web 版本1.6.0

wuziheng commented 1 year ago

@pdshwc 我们会跟一下这个问题，第一次出现 debug 参数的mismatch 补充提供他一下accelerate的版本？

D-Mad commented 1 year ago

I solved it by updating to the latest accelerate pip install accelerate==0.23.0 automatic1111 webui :git checkout a0af2852b67859b427b662789d0b42f592e78dec

pdshwc commented 12 months ago

经测试pip install accelerate==0.23.0 不行 venv环境0.21.0 0.23.0不行相同的报错

pdshwc commented 12 months ago

我通过更新到最新的加速 pip install Accelerator==0.23.0automatic1111 webui :git checkout a0af2852b67859b427b662789d0b42f592e78dec解决了这个问题

老哥这个sd版本是哪个tag

pdshwc commented 12 months ago

I solved it by updating to the latest accelerate pip install accelerate==0.23.0 automatic1111 webui :git checkout a0af2852b67859b427b662789d0b42f592e78dec

老哥切换分支后升级0.23.0 还是报错多走了两步报错信息如下

D-Mad commented 12 months ago

The image uploaded for training must match the selected resolution, the default is 512x512 Then I downgraded automatic1111's git because I tried the highest version and it failed git checkout a0af2852b67859b427b662789d0b42f592e78dec then reinstall pip install accelerate==0.23.0 by default will use version 0.21.0 I solved it that way. If the error still persists, I guess I should reset the webui somewhere or find a more reasonable version of automatic1111. Screenshot 2023-10-24 080046

wuziheng commented 12 months ago

@D-Mad thanks for you reply.

pdshwc commented 12 months ago

I solved it by updating to the latest accelerate pip install accelerate==0.23.0 automatic1111 webui :git checkout a0af2852b67859b427b662789d0b42f592e78dec

老哥切换分支后升级0.23.0 还是报错多走了两步报错信息如下

由于我是window11系统，升级0.23.0后出现报错，显示dataset文件系统window不支持问题， issue：https://github.com/huggingface/datasets/issues/6330，现已通过把fsspec==2023.10.0 换成fsspec==2023.9.2 就好了，可以正常运行sd-webui 1.6.0版本锁定 accelerate==0.23.0 可通过sdwebui配置文件锁定

感谢大佬回复支持。@wuziheng @D-Mad

wuziheng commented 12 months ago

fsspec==2023.9.2 已经紧急修复。参考PR203

zegwei commented 10 months ago

大佬能不能提供详细步骤？（1）把fsspec==2023.10.0 换成fsspec==2023.9.2 （2）锁定 accelerate==0.23.0 ,sdwebui配置文件在什么路径？

aigc-apps / sd-webui-EasyPhoto

训练报错，单卡2080ti，报错内容如下 #197