Open K1LL3RPUNCH opened 1 month ago
20:20:44-548897 INFO bucket_no_upscale = true bucket_reso_steps = 64 cache_latents = true caption_extension = ".txt" clip_skip = 1 dynamo_backend = "no" enable_bucket = true epoch = 40 gradient_accumulation_steps = 1 huber_c = 0.1 huber_schedule = "snr" learning_rate = 0.0001 loss_type = "l2" lr_scheduler = "cosine" lr_scheduler_args = [] lr_scheduler_num_cycles = 1 lr_scheduler_power = 1 lr_warmup_steps = 200 max_bucket_reso = 2048 max_data_loader_n_workers = 0 max_grad_norm = 1 max_timestep = 1000 max_token_length = 75 max_train_steps = 2000 min_bucket_reso = 256 mixed_precision = "no" multires_noise_discount = 0.3 network_alpha = 1 network_args = [] network_dim = 8 network_module = "networks.lora" noise_offset_type = "Original" optimizer_args = [] optimizer_type = "Adafactor" output_dir = "D:/StableDiffusion/trained" output_name = "spikeschaffer" pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5" prior_loss_weight = 1 resolution = "768" sample_prompts = "D:/StableDiffusion/trained\prompt.txt" sample_sampler = "k_dpm_2_a" save_every_n_epochs = 3 save_model_as = "safetensors" save_precision = "fp16" text_encoder_lr = 0.0001 train_batch_size = 2 train_data_dir = "D:/Spike Schaffer" unet_lr = 0.0001 xformers = true
20:20:44-552897 INFO end of toml config file: D:/StableDiffusion/trained/config_lora-20240802-202044.toml
Hi, I got exact same issue. This traceback happens during training, after some epochs are finished, always at different times during the training. Sometimes soon, sometimes after an hour. There is no consistancy when it happens. For me it started to happen after I upgraded to the latest NVIDIA driver.
when the training crashes, there is an event logged in Windows Application event viewer:
Faulting application name: python.exe, version: 3.10.9150.1013, time stamp: 0x638fa05d Faulting module name: nvcuda64.dll, version: 32.0.15.6070, time stamp: 0x668eca2c Exception code: 0xc0000005 Fault offset: 0x000000000002d13a Faulting process ID: 0x2668 Faulting application start time: 0x01dae709114af2ab Faulting application path: C:\Program Files\Python310\python.exe Faulting module path: C:\Windows\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_1196b342b24df5d1\nvcuda64.dll Report ID: c6dcbbf6-5d67-42e1-824b-f6f3546b49d5 Faulting package full name: Faulting package-relative application ID:
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 16.55it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing
File "D:\StableDiffusion\stable-diffusion-webui\extensions\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "D:\StableDiffusion\stable-diffusion-webui\extensions\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "D:\StableDiffusion\stable-diffusion-webui\extensions\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\StableDiffusion\stable-diffusion-webui\extensions\kohya_ss\venv\Scripts\python.exe', 'D:/StableDiffusion/stable-diffusion-webui/extensions/kohya_ss/sd-scripts/train_network.py', '--config_file', 'D:/StableDiffusion/trained/config_lora-20240802-201633.toml']' returned non-zero exit status 3221225477.
20:16:50-267285 INFO Training has ended.
safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . INFO UNet2DConditionModel: 64, 8, 768, False, False original_unet.py:1387 Traceback (most recent call last): File "C:\Users\inkvi\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\inkvi\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\StableDiffusion\stable-diffusion-webui\extensions\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in