A suitable conda environment named AMC
can be created
and activated with:
conda env create -f environment.yaml
conda activate AMC
First, please download the coco dataset from here. We use COCO2014 in the paper. Then, you can process your data with this script:
python coco_preprocess.py \
--coco_image_path /YOUR/COCO/PATH/train2014 \
--coco_caption_file /YOUR/COCO/PATH/annotations/captions_train2014.json \
--coco_instance_file /YOUR/COCO/PATH/annotations/instances_train2014.json \
--output_dir /YOUR/DATA/PATH
Before training, you need to change configs in train_boxnet.sh
You can train the BoxNet through this script:
sh train_boxnet.sh $NODE_NUM $CURRENT_NODE_RANK $GPUS_PER_NODE
With a trained BoxNet, you can start the Text-to-Image Synthesis with:
python test_pipeline_onestage.py \
--stable_model_path /stable-diffusion-v1-5/checkpoint
--boxnet_model_path /TRAINED/BOXNET/CKPT
--output_dir /YOUR/SAVE/DIR
all the test prompt is saved in file test_prompts.json
.
This implementation is based on the repo from the diffusers library. Fengshenbang-LM codebase. DETR codebase.