Clone Repo
git clone https://github.com/clin1223/GenerateU.git
Create Conda Environment and Install Dependencies
# create new anaconda env
conda create -n GenerateU python=3.8 -y
conda activate GenerateU
# install python dependencies
pip3 install -e . --user
pip3 install -r requirements.txt
# compile Deformable DETR
cd projects/DDETRS/ddetrs/models/deformable_detr/ops
bash make.sh
requirements.txt
Download our pretrained models from here to the weights
folder. For training, prepare the backbone weight Swin-Tiny and Swin-Large following instruction in tools/convert-pretrained-swin-model-to-d2.py
The directory structure will be arranged as:
weights
|- vg_swinT.pth
|- vg_swinL.pth
|- vg_grit5m_swinT.pth
|- vg_grit5m_swinL.pth
|- swin_tiny_patch4_window7_224.pkl
|- swin_large_patch4_window12_384_22k.pkl
Dataset strcture should look like:
|-- datasets
`-- |-- vg
|-- |-- images/
|-- |-- train_from_objects.json
`-- |-- lvis
|-- |-- val2017/
|-- |-- lvis_v1_minival.json
|-- |-- lvis_v1_clip_a+cname_ViT-H.npy
By default, we train GenerateU using 16 A100 GPUs. You can also train on a single node, but this might prevent you from reproducing the results presented in the paper.
When pretraining with VG, single node is enough. On a single node with 8 GPUs, run
python3 launch.py --nn 1 --uni 1 \
--config-file projects/DDETRS/configs/vg_swinT.yaml OUTPUT_DIR outputs/${EXP_NAME}
# On node 0, run
python3 launch.py --nn 2 --port <PORT> --worker_rank 0 --master_address <MASTER_ADDRESS> \
--uni 1 --config-file /path/to/config/name.yaml OUTPUT_DIR outputs/${EXP_NAME}
# On node 1, run
python3 launch.py --nn 2 --port <PORT> --worker_rank 1 --master_address <MASTER_ADDRESS> \
--uni 1 --config-file /path/to/config/name.yaml OUTPUT_DIR outputs/${EXP_NAME}
<MASTER_ADDRESS>
should be the IP address of node 0. <PORT>
should be the same among multiple nodes. If <PORT>
is not specifed, programm will generate a random number as <PORT>
.
To evaluate a model with a trained/ pretrained model, run
python3 launch.py --nn 1 --eval-only --uni 1 --config-file /path/to/config/name.yaml \
OUTPUT_DIR outputs/${EXP_NAME} MODEL.WEIGHTS /path/to/weight.pth
If you find our repo useful for your research, please consider citing our paper:
@inproceedings{lin2024generateu,
title={Generative Region-Language Pretraining for Open-Ended Object Detection},
author={Chuang, Lin and Yi, Jiang and Lizhen, Qu and Zehuan, Yuan and Jianfei, Cai},
booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
If you have any questions, please feel free to reach me out at chuang.lin@monash.edu
.
This code is based on UNINEXT. Some code are brought from FlanT5. Thanks for their awesome works.
Special thanks to Bin Yan and Junfeng Wu for their valuable contributions.