training the unet error

sanjaymalladi commented 1 year ago

Training the UNet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 852, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 437, in main accelerator = Accelerator( File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 286, in init raise ValueError(err.format(mode="fp16", requirement="a GPU")) ValueError: fp16 mixed precision requires a GPU Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=500', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/msanjay', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-512', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/msanjay/instance_images', '--output_dir=/content/models/msanjay', '--instance_prompt=', '--seed=942475', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=1000']' returned non-zero exit status 1. Something went wrong

gcohen-dev commented 1 year ago

This is not related to this library. You are out of GPU resources, pay for colab pro or let it rest until you will get GPU resources.

archimedesinstitute commented 1 year ago

That's not true - I'm logged in on a pro+ account and I got the same error.

TheLastBen commented 1 year ago

ValueError: fp16 mixed precision requires a GPU

run !nvidia-smi a new cell and paste the result here.

archimedesinstitute commented 1 year ago

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

TheLastBen commented 1 year ago

you're still getting the error ?

archimedesinstitute commented 1 year ago

I was when I ran that code. I can try it again later.

On Wed, Dec 21, 2022 at 12:21 AM, Ben @.***> wrote:

you're still getting the error ?

— Reply to this email directly, view it on GitHub https://github.com/TheLastBen/fast-stable-diffusion/issues/1090#issuecomment-1360872343, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4A3AP3KPATQQPZXJMESGBLWOKHVFANCNFSM6AAAAAATEB7DKI . You are receiving this because you commented.Message ID: @.***>

CJohnDesign commented 1 year ago

I'm getting this error as well

Thu Dec 22 18:39:05 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0    29W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

CJohnDesign commented 1 year ago

I turned the Number of vectors per token from 8 to 4 when creating the embedding and it worked - Idk if it just happened to work the second time tho ¯_(ツ)_/¯

TheLastBen / fast-stable-diffusion

training the unet error #1090