YangLing0818 / ContextDiff

[ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation
56 stars 3 forks source link

how to use Context-Aware Adapter #8

Open jupytera opened 1 month ago

jupytera commented 1 month ago

Hello, I have reviewed the content you wrote in the ‘readme’ for text guided image generation. In the parameters of the second step 'Finetune Diffusion Model with Context-Aware Adapter', it seems that there is no option to call the pretrained Context-Aware Adapter in the first step. All the parameters are here: CUDA_VISIBLE_DEVICES=0 finetune_diffusion.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" --train_data_dir=./train2017 --use_ema --resolution=512 --center_crop --random_flip --train_batch_size=32 --gradient_accumulation_steps=1 --gradient_checkpointing --max_train_steps=50000 --checkpointing_steps=10000 --learning_rate=2e-05 --max_grad_norm=1 --lr_scheduler="constant" --lr_warmup_steps=0 --output_dir="./output" So I want to know how does the Context-Aware Adapter model work? Or which pretrained model mentioned in the parameters could be replaced by Context-Aware Adapter?Thank you for your help!

BitCodingWalkin commented 1 month ago

Thank you for your attention for our work. In the file finetune_diffusion.py, for the sake of convenient testing, we have directly employed a multitude of CLIP models to compose a simplified, readily deployable version of the Context-Aware Adapter, present in the code as MultiCLIP. Should you require the utilization of a Context-Aware Adapter trained by yourself, a few minor adjustments in the code are all that is necessary to substitute MultiCLIP with your own trained Context-Aware Adapter.

jupytera commented 1 month ago

Thank you for your attention for our work. In the file finetune_diffusion.py, for the sake of convenient testing, we have directly employed a multitude of CLIP models to compose a simplified, readily deployable version of the Context-Aware Adapter, present in the code as MultiCLIP. Should you require the utilization of a Context-Aware Adapter trained by yourself, a few minor adjustments in the code are all that is necessary to substitute MultiCLIP with your own trained Context-Aware Adapter.

Thank you for your reply! It seems that the model trained by train_adapter.py is a ClipPrior model, which is different from other pretrained models in use. I don't know how to adjust the code, would you glad to give some instructions?