Open iqddd opened 1 year ago
did you install any dependency during the session ?
Only those in the "Dependencies" cell. Followed the usual procedure. Sequential startup:
what model ? default or a custom one
Based on default SD2.1 768px.
try restarting the session, and use the latest colab
What do you mean by "use the latest colab". https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb I think the Colab at the link above is always the latest. Isn't it?
in the latest colab, the tensorflow msg doesn't show
How do I switch to the latest Colab?
the link above is correct
I also ran into an error using the latest Colab (the link above) today. Not seeing the tensorflow msg so I guess it's another issue?
0% 0/1000 [00:00<?, ?it/s] tshirt tshirt Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 789, in <module>
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 676, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/operations.py", line 507, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.8/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py", line 339, in forward
sample, res_samples = downsample_block(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 637, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 630, in custom_forward
return module(*inputs, return_dict=return_dict)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 213, in forward
hidden_states = self.proj_in(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 4D
0% 0/1000 [00:07<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=325', '--save_n_steps=325', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/lapitadress', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/lapitadress/instance_images', '--output_dir=/content/models/lapitadress', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/lapitadress/captions', '--instance_prompt=', '--seed=247655', '--resolution=768', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=1000']' returned non-zero exit status 1.
Something went wrong
Seems like resuming training for models based on SD2.1-768px is broken. Resuming training for SD2.1-512px and SD1.5 works fine.
resuming the training or resuming the session and training after disconnecting ?
Resuming the training (run "Start DreamBooth" cell with "Resume training" checkbox selected)
I'll check it out
fixed
Training
Training the UNet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 789, in
make sure you set your session to GPU
Trying to resume network training based on v2.1 768px. Trying to resume network training based on v2.1 768px. Almost immediately I get an error.
upd: There are no errors during training on v1.5.