bmaltais / kohya_ss

Apache License 2.0
9.61k stars 1.24k forks source link

Flux Dreambooth crashes after trying to save checkpoint #2718

Open DarkViewAI opened 2 months ago

DarkViewAI commented 2 months ago

I am able to successful db training and samples look good, but after 600 steps it gives me this error

2024-08-19 07:57:06 INFO train_util.py:5123 INFO saving checkpoint: train_util.py:5124 /home/Ubuntu/Downloads/model
4/Rick-step00000600.safetensors
Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/Downloads/model 4/config_dreambooth-20240819-073001.toml', '--fp8_base', '--highvram', '--cpu_offload_checkpointing']' died with <Signals.SIGKILL: 9>.

goodluckluyan commented 2 months ago

same issue