TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.54k stars 1.31k forks source link

Getting this issue all the sudden when training: #1228

Open mistersprinklez opened 1 year ago

mistersprinklez commented 1 year ago

File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 852, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 719, in main accelerator.backward(loss) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 882, in backward self.scaler.scale(loss).backward(*kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 267, in apply return user_fn(self, args) File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 414, in wrapper outputs = fn(ctx, args) File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/init.py", line 111, in backward grads = _memory_efficient_attention_backward( File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/init.py", line 381, in _memory_efficient_attention_backward grads = op.apply(ctx, inp, grad) File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/flash.py", line 339, in apply cls.OPERATOR( File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 442, in call return self._op(args, **kwargs or {}) File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/flash.py", line 96, in _flash_bwd _C_flashattention.bwd( TypeError: bwd(): incompatible function arguments. The following argument types are supported:

  1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: at::Tensor, arg5: at::Tensor, arg6: at::Tensor, arg7: at::Tensor, arg8: at::Tensor, arg9: at::Tensor, arg10: at::Tensor, arg11: int, arg12: int, arg13: float, arg14: float, arg15: bool, arg16: bool, arg17: Optional[at::Generator]) -> List[at::Tensor]

Invoked with: tensor([[[-1.1425e-03, -1.4229e-03, 3.8314e-04, ..., -8.2922e-04, 3.0117e-03, 1.0176e-03]],

    [[-5.5933e-04, -1.7071e-04,  8.7452e-04,  ..., -1.2007e-03,
       2.0180e-03,  4.1866e-04]],

    [[ 7.5722e-04,  4.1461e-04,  6.2847e-04,  ..., -9.7513e-05,
       5.2643e-04,  4.0054e-05]],

    ...,

    [[ 1.6527e-03, -7.3004e-04,  2.2011e-03,  ..., -3.5152e-03,
      -1.5473e-04, -1.6766e-03]],

    [[ 1.1759e-03,  4.7624e-05, -1.4963e-03,  ..., -1.1835e-03,
      -3.5477e-04,  9.3985e-04]],

    [[ 1.6012e-03,  7.1526e-06,  1.3247e-03,  ..., -5.8270e-04,
      -4.5705e-04, -6.4621e-03]]], device='cuda:0', dtype=torch.float16), tensor([[[ 0.0041,  0.0966, -0.0964,  ...,  0.5845,  0.0496, -0.0231]],

    [[ 0.0364,  0.0576, -0.0807,  ...,  0.6260,  0.0196,  0.0607]],

    [[ 0.0454,  0.0985, -0.1059,  ...,  0.6646, -0.0129,  0.0534]],

    ...,

    [[-0.2209,  0.1581, -0.0898,  ...,  0.7290,  0.2583,  0.0182]],

    [[-0.1503,  0.1423, -0.0144,  ...,  0.6694,  0.0794, -0.0552]],

    [[-0.3232,  0.3208,  0.0155,  ...,  0.5854,  0.1582, -0.2556]]],
   device='cuda:0', dtype=torch.float16, requires_grad=True), tensor([[[ 0.6045,  0.0159, -0.9556,  ...,  9.3516, -0.4182, -0.5967]],

    [[ 0.9419,  3.4121, -2.0469,  ..., -0.9756, -1.4150, -2.8555]],

    [[-0.5537,  1.5352, -1.9922,  ..., -0.0399, -1.4609, -1.4004]],

    ...,

    [[-1.4854,  5.6133,  3.4766,  ..., -1.4717, -0.0334, -3.5977]],

    [[-0.9219,  4.3398,  4.0625,  ...,  0.6807,  0.8340,  0.2474]],

    [[-1.5889,  2.5371,  2.6055,  ..., -0.8291,  0.3860,  0.9883]]],
   device='cuda:0', dtype=torch.float16, requires_grad=True), tensor([[[-2.8491e-04,  6.3972e-03,  1.9855e-03,  ...,  6.5536e-03,
       8.3694e-03,  3.5114e-03]],

    [[-5.4834e-01, -2.7588e-01, -4.4629e-01,  ..., -6.1865e-01,
      -1.1201e+00,  9.7046e-02]],

    [[ 1.1328e-01, -8.0176e-01, -3.5425e-01,  ..., -3.6572e-01,
      -6.6748e-01,  2.6294e-01]],

    ...,

    [[-1.2012e+00,  5.0293e-01, -5.4980e-01,  ..., -4.8145e-01,
      -2.3376e-01, -8.0762e-01]],

    [[ 7.9150e-01,  3.9581e-02,  3.8379e-01,  ..., -3.7036e-01,
       1.5259e-01, -3.9154e-02]],

    [[-3.4033e-01,  2.0703e-01,  1.9531e-01,  ..., -2.5684e-01,
       2.2095e-02, -2.0618e-01]]], device='cuda:0', dtype=torch.float16,
   requires_grad=True), tensor([[[-1.5869e-02, -3.6713e-02, -1.8021e-02,  ..., -1.9897e-02,
      -3.5553e-02,  1.8021e-02]],

    [[-8.8730e-03, -1.6418e-02, -8.6517e-03,  ..., -7.5836e-03,
      -1.5083e-02,  1.1200e-02]],

    [[-7.9346e-03, -1.3870e-02, -7.7133e-03,  ..., -6.2752e-03,
      -1.3046e-02,  1.0330e-02]],

    ...,

    [[ 1.3390e-03,  2.2202e-03,  2.5978e-03,  ..., -4.4441e-03,
      -3.0458e-05, -5.5771e-03]],

    [[ 8.1491e-04,  4.8981e-03,  2.7809e-03,  ..., -9.2163e-03,
      -6.1870e-05, -9.5825e-03]],

    [[-4.0092e-03,  1.4481e-02,  1.5640e-03,  ..., -2.5162e-02,
      -8.6021e-04, -2.4170e-02]]], device='cuda:0', dtype=torch.float16), tensor([[[ 3.6610,  4.2841,  4.5099,  ...,  4.2509,  4.0479,  3.7858]],

    [[11.9474, 13.0159, 13.1666,  ..., 12.7669, 12.7086, 12.2172]],

    [[ 7.5015,  6.8602,  7.4053,  ...,  6.6963,  5.9565,  6.3367]],

    ...,

    [[ 5.2950,  5.2959,  5.9088,  ...,  5.9099,  5.3422,  5.3414]],

    [[ 8.1491,  6.3357,  7.2061,  ...,  7.9767,  6.6806,  8.2760]],

    [[ 5.7174,  6.6173,  6.4276,  ...,  5.9319,  5.6392,  5.4336]]],
   device='cuda:0'), tensor([[[-0.0057,  0.0053,  0.0010,  ...,  0.0044,  0.0052, -0.0075]],

    [[ 0.0024,  0.0004, -0.0017,  ..., -0.0034, -0.0034,  0.0001]],

    [[-0.0014,  0.0070,  0.0029,  ...,  0.0009, -0.0033, -0.0041]],

    ...,

    [[ 0.0052,  0.0035, -0.0029,  ..., -0.0118,  0.0031,  0.0047]],

    [[-0.0202,  0.0086, -0.0132,  ..., -0.0142, -0.0007, -0.0092]],

    [[-0.0023,  0.0066,  0.0077,  ..., -0.0109,  0.0027, -0.0145]]],
   device='cuda:0', dtype=torch.float16), tensor([[[ 0.0625,  0.0655, -0.0171,  ..., -0.0428,  0.0754, -0.0107]],

    [[ 0.0916, -0.0214, -0.0131,  ...,  0.0239,  0.0221,  0.0493]],

    [[ 0.0662, -0.0168, -0.0457,  ..., -0.0692,  0.0851,  0.0655]],

    ...,

    [[-0.0857,  0.0763, -0.0352,  ...,  0.0928,  0.0714,  0.0479]],

    [[ 0.0699, -0.0009,  0.0394,  ...,  0.0362,  0.0260,  0.0488]],

    [[-0.0106, -0.0682,  0.0614,  ..., -0.0508,  0.0477,  0.0402]]],
   device='cuda:0', dtype=torch.float16), tensor([[[-0.0329,  0.0485, -0.0657,  ...,  0.0088, -0.1188,  0.0003]],

    [[-0.0504,  0.0140, -0.0249,  ...,  0.0004, -0.0051, -0.0205]],

    [[-0.0820,  0.0120, -0.0510,  ..., -0.0550, -0.0706, -0.0015]],

    ...,

    [[ 0.0641,  0.0008, -0.0650,  ..., -0.0106, -0.0113,  0.0507]],

    [[-0.0122, -0.0238, -0.0859,  ..., -0.0057, -0.1112,  0.0172]],

    [[ 0.0509, -0.0247,  0.0069,  ...,  0.0508, -0.0621, -0.0143]]],
   device='cuda:0', dtype=torch.float16), tensor([    0,  4096,  8192, 12288, 16384, 20480, 24576, 28672, 32768],
   device='cuda:0', dtype=torch.int32), tensor([ 0,  4,  8, 12, 16, 20, 24, 28, 32], device='cuda:0',
   dtype=torch.int32), 4096, 4, 0.0, 0.15811388300841897, False, False, 0, None

0% 0/3000 [00:02<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--stop_text_encoder_training=250', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Jabba2', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Jabba2/instance_images', '--output_dir=/content/models/Jabba2', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Jabba2/captions', '--instance_prompt=', '--seed=574824', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=1e-05', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1. Something went wrong

mistersprinklez commented 1 year ago

Before the UNet error, I get this text encoder error:

Training the text encoder... 2023-01-03 05:16:10.985527: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. '########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######::: ... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##:: ::: ##:::: ##:::: ##::'##:. ##::: ##:: ####: ##:: ##:: ####: ##: ##:::..::: ::: ##:::: ########::'##:::. ##:: ##:: ## ## ##:: ##:: ## ## ##: ##::'####: ::: ##:::: ##.. ##::: #########:: ##:: ##. ####:: ##:: ##. ####: ##::: ##:: ::: ##:::: ##::. ##:: ##.... ##:: ##:: ##:. ###:: ##:: ##:. ###: ##::: ##:: ::: ##:::: ##:::. ##: ##:::: ##:'####: ##::. ##:'####: ##::. ##:. ######::: :::..:::::..:::::..::..:::::..::....::..::::..::....::..::::..:::......::::

0% 0/250 [00:00<?, ?it/s] Jabba Jabba Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 852, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 693, in main model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, kwargs) File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/operations.py", line 507, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/usr/local/lib/python3.8/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast return func(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py", line 350, in forward sample = self.mid_block(sample, emb, encoder_hidden_states=encoder_hidden_states) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 428, in forward hidden_states = attn(hidden_states, encoder_hidden_states).sample File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 219, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 473, in forward hidden_states = self.attn1(norm_hidden_states) + hidden_states File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 563, in forward hidden_states = self._memory_efficient_attention_xformers(query, key, value) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 624, in _memory_efficient_attention_xformers hidden_states = xformers.ops.memory_efficient_attention(query, key, value, attn_bias=None) File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/init.py", line 193, in memory_efficient_attention return _memory_efficient_attention( File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/init.py", line 294, in _memory_efficient_attention return _fMHA.apply( File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/init.py", line 43, in forward out, op_ctx = _memory_efficient_attention_forward_requires_grad( File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/init.py", line 326, in _memory_efficient_attention_forward_requires_grad out = op.apply(inp, needs_gradient=True) File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/flash.py", line 240, in apply out, softmax_lse = cls.OPERATOR( File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 442, in call return self._op(args, **kwargs or {}) File "/usr/local/lib/python3.8/dist-packages/xformers/ops/fmha/flash.py", line 59, in _flash_fwd lse = _C_flashattention.fwd( TypeError: fwd(): incompatible function arguments. The following argument types are supported:

  1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: at::Tensor, arg5: int, arg6: int, arg7: float, arg8: float, arg9: bool, arg10: bool, arg11: bool, arg12: Optional[at::Generator]) -> List[at::Tensor]

Invoked with: tensor([[[-0.2954, -0.0532, -0.3613, ..., -0.1678, 0.3162, 0.3679]],

    [[-0.3757, -0.0265,  0.7148,  ...,  0.3625,  0.1262,  0.2776]],

    [[-0.4910, -0.2644, -0.0623,  ..., -0.0681,  0.0359,  0.6270]],

    ...,

    [[ 0.0436, -0.6108,  0.0047,  ...,  0.2971,  0.4290, -0.7031]],

    [[-0.0545, -0.7798, -0.5498,  ..., -0.0966,  0.4048, -0.6187]],

    [[-0.2437, -0.6924, -0.2314,  ..., -0.1779, -0.0747, -0.7769]]],
   device='cuda:0', dtype=torch.float16, requires_grad=True), tensor([[[ 0.0975,  0.3992, -0.7261,  ..., -0.4883, -0.1637, -0.6479]],

    [[ 0.2832,  0.9229, -0.2194,  ...,  0.0740, -0.1065, -0.6523]],

    [[ 0.1605,  0.6011, -0.5474,  ..., -0.0182, -0.0898, -0.6641]],

    ...,

    [[-0.5063,  0.0097,  0.0425,  ...,  0.7388, -0.3315,  1.5195]],

    [[-0.3940,  0.1415, -0.2974,  ...,  0.2842, -0.1648,  1.1846]],

    [[-0.9570,  0.2820,  0.3958,  ...,  0.3896, -0.3459,  1.3447]]],
   device='cuda:0', dtype=torch.float16, requires_grad=True), tensor([[[-2.1500e-02, -1.1377e-01, -2.2961e-01,  ..., -5.0293e-01,
      -6.4087e-02,  1.6577e-01]],

    [[ 3.5718e-01,  1.1774e-01,  5.9277e-01,  ..., -8.3447e-05,
      -9.0637e-02,  1.6846e-01]],

    [[ 3.6792e-01,  1.3452e-01,  6.2402e-01,  ...,  6.7291e-03,
       7.9468e-02, -2.2461e-01]],

    ...,

    [[ 3.9575e-01, -1.2024e-01,  2.8442e-02,  ...,  2.5977e-01,
       4.3335e-01, -3.2544e-01]],

    [[ 2.4597e-01,  9.6741e-02,  4.4824e-01,  ...,  1.0078e+00,
       1.1797e+00, -8.5010e-01]],

    [[ 2.2937e-01, -2.7832e-01, -8.7097e-02,  ...,  7.4829e-02,
      -3.4570e-01, -4.7046e-01]]], device='cuda:0', dtype=torch.float16,
   requires_grad=True), tensor([[[0., 0., 0.,  ..., 0., 0., 0.]],

    [[0., 0., 0.,  ..., 0., 0., 0.]],

    [[0., 0., 0.,  ..., 0., 0., 0.]],

    ...,

    [[0., 0., 0.,  ..., 0., 0., 0.]],

    [[0., 0., 0.,  ..., 0., 0., 0.]],

    [[0., 0., 0.,  ..., 0., 0., 0.]]], device='cuda:0',
   dtype=torch.float16), tensor([   0,   64,  128,  192,  256,  320,  384,  448,  512,  576,  640,  704,
     768,  832,  896,  960, 1024, 1088, 1152, 1216, 1280], device='cuda:0',
   dtype=torch.int32), tensor([   0,   64,  128,  192,  256,  320,  384,  448,  512,  576,  640,  704,
     768,  832,  896,  960, 1024, 1088, 1152, 1216, 1280], device='cuda:0',
   dtype=torch.int32), 64, 64, 0.0, 0.125, False, False, False, 0, None

0% 0/250 [00:02<?, ?it/s]

mistersprinklez commented 1 year ago

Did some troubleshooting and can see now that the notebook can train while my GPU is set to standard. Would love to figure out the issue so I can train on Premium again!

TheLastBen commented 1 year ago

Are you using the latest colab ?

eds123 commented 1 year ago

Are you using the latest colab ?

Yes, can confirm this happens on the latest notebook.

mistersprinklez commented 1 year ago

Thank you for your response! I’m using the link from your git page:

On Jan 3, 2023, at 3:44 AM, Ben @.***> wrote:

 Are you using the latest colab ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

mistersprinklez commented 1 year ago

Yes, from the latest notebook!

mistersprinklez commented 1 year ago

any idea what may be going on ? @eds123

TheLastBen commented 1 year ago

T4 or A100 ?

MELT9000 commented 1 year ago

Same here, A100, latest Dreambooth colab, v756 model.

johnrplanta commented 1 year ago

Both Text & Unet training produce same errors on an A100 GPU - Premium Colab. Confirmed that when using Standard GPU, T4 - training works. Latest Colabs.

TheLastBen commented 1 year ago

with the premium gpu, run this :

!pip uninstall -y -q xformers
!pip install ninja
!pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
%cd /content
!zip -r A100 /usr/local/lib/python3.8/dist-packages/xformers
!cp A100.zip /content/gdrive/MyDrive

Then send me the link to the A100.zip

johnrplanta commented 1 year ago

Here's the link to the A100.zip produced from running the script. https://drive.google.com/file/d/1NApnb3CiUrvRB7si-SVIB25X92v10mnd/view?usp=share_link

TheLastBen commented 1 year ago

great thanks !

mistersprinklez commented 1 year ago

!pip uninstall -y -q xformers !pip install ninja !pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers %cd /content !zip -r A100 /usr/local/lib/python3.8/dist-packages/xformers !cp A100.zip /content/gdrive/MyDrive

Hey! So in order to run premium gpus we have to send you this A100.zip file?

johnrplanta commented 1 year ago

He already fixed it on a previous update. I am now able to run on premium GPUs. Thank you!

mistersprinklez commented 1 year ago

He already fixed it on a previous update. I am now able to run on premium GPUs. Thank you!

Great!