ShivamShrirao / diffusers

๐Ÿค— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.89k stars 509 forks source link

AssertionError: You can't use same `Accelerator()` instance with multiple models when using DeepSpeed #241

Open dizhenx opened 1 year ago

dizhenx commented 1 year ago

Describe the bug

The same configuration, training on an 8 GB GPU, can work fine, but when executing the script to fine-tune the text encoder with UNet, an error occurs. The config file is as follows: compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero_stage: 2 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_config: dynamo_backend: FX2TRT machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 4 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false The error message is as follows: You can't use same Accelerator() instance with multiple models when using DeepSpeed

Reproduction

accelerate launch --config_file /mnt/data/huggingface/accelerate/default_config1.yaml train_dreambooth.py --pretrained_model_name_or_path=$MODEL_NAME --train_text_encoder --instance_data_dir=$INSTANCE_DIR --class_data_dir=$CLASS_DIR --output_dir=$OUTPUT_DIR --with_prior_preservation --prior_loss_weight=1.0 --instance_prompt="a photo of sks dog" --class_prompt="a photo of dog" --resolution=512 --train_batch_size=1 --use_8bit_adam --gradient_checkpointing --learning_rate=2e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --num_class_images=200 --max_train_steps=20

Logs

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /mnt/data/creative/diffusers/examples/dreambooth/train_dreambooth.py:869 in <module>             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   866                                                                                            โ”‚
โ”‚   867 if __name__ == "__main__":                                                                 โ”‚
โ”‚   868 โ”‚   args = parse_args()                                                                    โ”‚
โ”‚ โฑ 869 โ”‚   main(args)                                                                             โ”‚
โ”‚   870                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /mnt/data/creative/diffusers/examples/dreambooth/train_dreambooth.py:684 in main                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   681 โ”‚   )                                                                                      โ”‚
โ”‚   682 โ”‚                                                                                          โ”‚
โ”‚   683 โ”‚   if args.train_text_encoder:                                                            โ”‚
โ”‚ โฑ 684 โ”‚   โ”‚   unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prep   โ”‚
โ”‚   685 โ”‚   โ”‚   โ”‚   unet, text_encoder, optimizer, train_dataloader, lr_scheduler                  โ”‚
โ”‚   686 โ”‚   โ”‚   )                                                                                  โ”‚
โ”‚   687 โ”‚   else:                                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /mnt/data/creative/miniconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/accelerator. โ”‚
โ”‚ py:1148 in prepare                                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1145 โ”‚   โ”‚   โ”‚   โ”‚   if isinstance(obj, torch.nn.Module):                                      โ”‚
โ”‚   1146 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   model_count += 1                                                      โ”‚
โ”‚   1147 โ”‚   โ”‚   โ”‚   if model_count > 1:                                                           โ”‚
โ”‚ โฑ 1148 โ”‚   โ”‚   โ”‚   โ”‚   raise AssertionError(                                                     โ”‚
โ”‚   1149 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   "You can't use same `Accelerator()` instance with multiple models wh  โ”‚
โ”‚   1150 โ”‚   โ”‚   โ”‚   โ”‚   )                                                                         โ”‚
โ”‚   1151                                                                                           โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
AssertionError: You can't use same `Accelerator()` instance with multiple models when using DeepSpeed
[10:57:19] ERROR    failed (exitcode: 1) local_rank: 0 (pid: 1146) of binary: /mnt/data/creative/miniconda3/envs/diffusers/bin/python3.9

System Info