VinAIResearch / Anti-DreamBooth

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis (ICCV 2023)
https://vinairesearch.github.io/Anti-DreamBooth/
GNU Affero General Public License v3.0
206 stars 17 forks source link

Failed to train and output noise-ckpt in Google Colab #3

Closed lbj96347 closed 1 year ago

lbj96347 commented 1 year ago

Background

I want to make a workaround in Google Colab for experiencing Anti-DreamBooth. Here is my ipynb. But I failed in Step 3 by running

!bash /content/Anti-DreamBooth/scripts/attack_with_aspl.sh

Google Colab Virtual ENV:

Details

After running attack_with_aspl.sh, it calls Anti-DreamBooth/attacks/aspl.py to run perturbation training. It is expected to output noise-ckpt in output dir. But it didn't.

When Anti-DreamBooth/attacks/aspl.py runs into line 715 , it throws error.

/content/Anti-DreamBooth/scripts/attack_with_aspl.sh: line 36:  1381 Killed                  /usr/bin/python3 /content/Anti-DreamBooth/attacks/aspl.py 

--pretrained_model_name_or_path=$MODEL_PATH 
--enable_xformers_memory_efficient_attention 
--instance_data_dir_for_train=$CLEAN_TRAIN_DIR 
--instance_data_dir_for_adversarial=$CLEAN_ADV_DIR 
--instance_prompt="a photo of sks person" 
--class_data_dir=$CLASS_DIR 
--num_class_images=200 
--class_prompt="a photo of person" 
--output_dir=$OUTPUT_DIR --center_crop 
--with_prior_preservation 
--prior_loss_weight=1.0 
--resolution=512 
--train_text_encoder 
--train_batch_size=1 
--max_train_steps=10 
--max_f_train_steps=3 
--max_adv_train_steps=6 
--checkpointing_iterations=10 
--learning_rate=5e-7 
--pgd_alpha=5e-3 
--pgd_eps=5e-2

Even I have changed --max_train_steps to 10, the execution was also being killed. I guess reason of execution being killed is GPU exhausted?

Because noise-ckpt was not outputed, the program was crashed with this final message:

ValueError: Instance outputs/ASPL/n000050_ADVERSARIAL/noise-ckpt/50 images root 
doesn't exists.

Questions

  1. How much GPU RAM or System RAM required during perturbation training?
  2. Do you have any suggestion for me, such as changing some parameters for completing perturbation training.
hao-pt commented 1 year ago

Hi, it seems like the error you encountered is expected due to insufficient GPU memory.

  1. To train a model of this size, you would need a GPU with at least 32GB of memory. Our experiments are typically conducted using a 40GB A100 GPU.
  2. Alternatively, you can change --sample_batch_size (default: 8) to a lower number (e.g. 2, 4). Besides, you can use the 8-bit optimizer to train on a 16GB GPU, please refer to this tutorial.
lbj96347 commented 1 year ago

@hao-pt thanks for your reply. Yes! The GPU caused the issue. After I changed to a 80GB A100 GPU, everything works fine.