TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.54k stars 1.31k forks source link

Dreamboth crashes on colab #2852

Open ClaraSanders opened 5 months ago

ClaraSanders commented 5 months ago

I've been training some models for weeks but now I get this when I run the training cell:

Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 535, in main import bitsandbytes as bnb File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/init.py", line 6, in from .autograd._functions import ( File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 5, in import bitsandbytes.functional as F File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/functional.py", line 13, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 41, in lib = CUDALibrary_Singleton.get_instance().lib File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 37, in get_instance cls._instance.initialize() File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 27, in initialize raise Exception('CUDA SETUP: Setup Failed!') Exception: CUDA SETUP: Setup Failed! Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--external_captions', '--offset_noise', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Alpha-Model', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Alpha-Model/instance_images', '--output_dir=/content/models/Alpha-Model', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Alpha-Model/captions', '--instance_prompt=', '--seed=984809', '--resolution=640', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=6000']' returned non-zero exit status 1. Something went wrong

ClaraSanders commented 5 months ago

As I told you in the Discussions section, this issue is probably caused by the morons at Google. The notebook only works under T4 GPUs, L4 causes the error above and I can't even connect to A100's GPU. I wish I had money to train my models on my PC and be free from this bullshit.

TheFlano23 commented 2 months ago

Problem still persists on my end, even with the T4 GPU selected.