AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.57k stars 26.75k forks source link

[Bug]: {LORA TRAINING} RuntimeError: CUDA error: the launch timed out and was terminated #12499

Closed NecromancerZ1 closed 1 year ago

NecromancerZ1 commented 1 year ago

Is there an existing issue for this?

What happened?

training LoRa's...

Error below:

RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

steps: 0%| | 3/5400 [02:26<73:09:01, 48.79s/it, loss=0.122]

Steps to reproduce the problem

  1. Go to (kohya>gui)
  2. Start training
  3. Error occurs

Current version:

15:13:32-093626 INFO Version: v21.8.7

15:13:32-099628 INFO nVidia toolkit detected 15:13:33-744996 INFO Torch 2.0.1+cu118 15:13:33-761005 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 15:13:33-763005 INFO Torch detected GPU: NVIDIA GeForce RTX 4060 Ti VRAM 8187 Arch (8, 9) Cores 34 15:13:33-764006 INFO Verifying modules instalation status from requirements_windows_torch2.txt... 15:13:33-766506 INFO Verifying modules instalation status from requirements.txt... 15:13:36-493311 INFO headless: False 15:13:36-496357 INFO Load CSS... Running on local URL: http://127.0.0.1:7860

What should have happened?

LoRA training

Version or Commit where the problem happens

n/a

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Windows

What device are you running WebUI on?

Nvidia GPUs (RTX 20 above)

Cross attention optimization

xformers

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--xformers --no-half-vae
set CUDA_LAUNCH_BLOCKING=1
call webui.bat

List of extensions

n/a

Console logs

Error below:

RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

steps:   0%|                                                           | 3/5400 [02:26<73:09:01, 48.79s/it, loss=0.122]

Additional information

Additional Specs (16x4) Total 64 GB RAM sticks 4060 Ti Nvidia RTX GPU 8 Gb

catboxanon commented 1 year ago

Please open an issue in the correct repo. https://github.com/bmaltais/kohya_ss (I assume it's this one)