TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.41k stars 1.28k forks source link

[Colab] can not start training #2706

Open elsucht opened 6 months ago

elsucht commented 6 months ago

With current version in colab I´m getting following error:

Training the UNet...
...
Traceback (most recent call last):
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in <module>
    main()
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 535, in main
    import bitsandbytes as bnb
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 27, in initialize
    raise Exception('CUDA SETUP: Setup Failed!')
Exception: CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--external_captions', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=1000', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/test2', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/test2/instance_images', '--output_dir=/content/models/mmates2', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/test2/captions', '--instance_prompt=', '--seed=825246', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=6000']' returned non-zero exit status 1.
Something went wrong
TheLastBen commented 6 months ago

Try using the T4 GPU

elsucht commented 6 months ago

Error using T4 GPU

Traceback (most recent call last):
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in <module>
    main()
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 590, in main
    train_dataloader = torch.utils.data.DataLoader(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 349, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 140, in __init__
    raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}")
ValueError: num_samples should be a positive integer value, but got num_samples=0
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=1000', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/test2', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/test2/instance_images', '--output_dir=/content/models/test2', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/test2/captions', '--instance_prompt=', '--seed=881116', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=6000']' returned non-zero exit status 1.
Something went wrong