Open bach777 opened 1 year ago
It should work fine, but on colab use stabilityai/stable-diffusion-2-1-base, not just 2-1. The problem lies with the image resolution. You can't finetune a model on 768 resolution on colab as it is guaranteed to run out of memory. And if you train the 768 model on 512 resolution it gets mixed up. Best bet is to use the base 2-1 and also change the scheduler to one of the Eulers.
It pop ups CUDA error with the current setup when we try with the SD 2.1-base version. Actually since this is the 512 resolution one, that should work.
Traceback (most recent call last):
File "/home/astroboy/github/shivamShrirao/diffusers/examples/dreambooth/train_dreambooth.py", line 822, in <module>
main(args)
File "/home/astroboy/github/shivamShrirao/diffusers/examples/dreambooth/train_dreambooth.py", line 794, in main
optimizer.step()
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/optimizer.py", line 134, in step
self.scaler.step(self.optimizer, closure)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 341, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 288, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/optim/optimizer.py", line 263, in step
self.init_state(group, p, gindex, pindex)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/optim/optimizer.py", line 401, in init_state
state["state2"] = torch.zeros_like(
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps: 0%| | 0/3600 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/home/astroboy/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/home/astroboy/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/astroboy/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--output_dir=data/model/model_ozguraltay-SD20', '--revision=fp16', '--train_text_encoder', '--with_prior_preservation', '--prior_loss_weight=1.0', '--seed=2384271801', '--resolution=512', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--sample_batch_size=4', '--max_train_steps=3600', '--save_interval=1800', '--save_sample_prompt=a full body portrait photo of ozguraltay man in black tuxedo, professional studio photograph, 80mm, f1.8, clean focused on the face', '--concepts_list=concepts_list/concepts_list-ozguraltay_man.json']' returned non-zero exit status 1.
Is there anyone experiencing the same problem?
It works perfectly with the old SD 1.5 version. Will there be any updates on the requirements files for the SD 2.1-base version soon?
@NeoAnthropocene Same problem!
The following values were not passed to accelerate launch
and had defaults used instead:
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--num_cpu_threads_per_process
was set to 1
to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
usage: train_dreambooth.py
[-h]
--pretrained_model_name_or_path
PRETRAINED_MODEL_NAME_OR_PATH
[--pretrained_vae_name_or_path PRETRAINED_VAE_NAME_OR_PATH]
[--revision REVISION]
[--tokenizer_name TOKENIZER_NAME]
[--instance_data_dir INSTANCE_DATA_DIR]
[--class_data_dir CLASS_DATA_DIR]
[--instance_prompt INSTANCE_PROMPT]
[--class_prompt CLASS_PROMPT]
[--save_sample_prompt SAVE_SAMPLE_PROMPT]
[--save_sample_negative_prompt SAVE_SAMPLE_NEGATIVE_PROMPT]
[--n_save_sample N_SAVE_SAMPLE]
[--save_guidance_scale SAVE_GUIDANCE_SCALE]
[--save_infer_steps SAVE_INFER_STEPS]
[--pad_tokens]
[--with_prior_preservation]
[--prior_loss_weight PRIOR_LOSS_WEIGHT]
[--num_class_images NUM_CLASS_IMAGES]
[--output_dir OUTPUT_DIR]
[--seed SEED]
[--resolution RESOLUTION]
[--center_crop]
[--train_text_encoder]
[--train_batch_size TRAIN_BATCH_SIZE]
[--sample_batch_size SAMPLE_BATCH_SIZE]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_train_steps MAX_TRAIN_STEPS]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_checkpointing]
[--learning_rate LEARNING_RATE]
[--scale_lr]
[--lr_scheduler LR_SCHEDULER]
[--lr_warmup_steps LR_WARMUP_STEPS]
[--use_8bit_adam]
[--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2]
[--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM]
[--push_to_hub]
[--hub_token HUB_TOKEN]
[--hub_model_id HUB_MODEL_ID]
[--logging_dir LOGGING_DIR]
[--log_interval LOG_INTERVAL]
[--save_interval SAVE_INTERVAL]
[--save_min_steps SAVE_MIN_STEPS]
[--mixed_precision {no,fp16,bf16}]
[--not_cache_latents]
[--hflip]
[--local_rank LOCAL_RANK]
[--concepts_list CONCEPTS_LIST]
train_dreambooth.py: error: unrecognized arguments: SD.2-1
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
Hi, I'm able to train the 512 base2-1 model, Colab, preservation on, 63 concept images, 1200 class images. The Samples come out fantastic from the 1k step checkpoint, the concept is uncannily similar (face) . I stopped the finetune at 3K steps as it started to overfit.
Now packaging to ckpt, bringing the model.yaml from the original base 2-1, and running in automatic1111 completely killed the finetune, no more semblance to the concept.
Any idea/suggestion?
Keep an eye on this as I would love to retrain my face with 2.1.
Will you be updating to work with 2.1 @ShivamShrirao ?
I kept experimenting. The pipeline/inference inside Shivam's colab works perfectly.
I can't tell if the ckpt conversion shreds the model or if it's the lack of a proper .yaml file defining the model so Automatic1111 can use it that breaks.
Update here, the training of a 2.1 model in Dreambooth, used in automatic1111 webui works.
It seems the Colab conversion to ckpt produces a bad stable diffusion model, or I'm doing something wrong either at runtime, or within my Google Drive.
Here is how I got a working model in Automatic1111:
https://github.com/lawfordp2017/diffusers/blob/main/scripts/convert_diffusers_to_original_stable_diffusion.py I end with a .ckpt.
Usage: convert_diffusers_to_original_stable_diffusion.py --model_path d://mypath//in//windows --model_checkpoint d://mypath//to//mymodel.ckpt
https://github.com/Stability-AI/stablediffusion/blob/main/configs/stable-diffusion/v2-inference.yaml
I can load this successfully in Automatic1111 and works beautifully!!!
It should work fine, but on colab use stabilityai/stable-diffusion-2-1-base, not just 2-1. The problem lies with the image resolution. You can't finetune a model on 768 resolution on colab as it is guaranteed to run out of memory. And if you train the 768 model on 512 resolution it gets mixed up. Best bet is to use the base 2-1 and also change the scheduler to one of the Eulers.
I should thank you, I didn't run out of memory, but still dreambooth wasn't working ; switching to -base model actually produced something.
Adding support for Stable Diffusion V2.1 please