Due to high website traffic, we have created multiple online services. If one link is not working, please use another one. Thank you for your support!
1. Installation
git clone https://github.com/OptimalScale/DetGPT.git
cd DetGPT
conda create -n detgpt python=3.9 -y
conda activate detgpt
pip install -e .
2. Install GroundingDino
python -m pip install -e GroundingDINO
2. Download the pretrained checkpoint and task tuning dataset
Our model is based on pretrained language model checkpoints. In our experiments, we use Robin from LMFlow team, and Vicuna and find they perform competitively well. You can run following script to download the Robin checkpoint:
cd output_models
bash download.sh all
cd -
Merge the robin lora model with the original llama model and save the merged
model to output_models/robin-7b
, where the corresponding model path is
specified in this config file
here.
To obtain the original llama model, one may refer to this doc. To merge a lora model with a base model, one may refer to PEFT or use the merge script provided by LMFlow.
The dataset for task tuning is named "coco_task_annotation.json". Please modify detgpt/configs/datasts/coco/align.yaml, such that "storage" points to the COCO dataset, and "file_name" points to the path of the instruction tuning dataset.
cd dataset
mkdir coco
Download the COCO dataset from COCO home page.
Here is the data structure:
dataset/coco/
├── train2017/
├── val2017/
├── annotations.json
├── coco_task_annotation.json
Note: Please move coco_task_annotation.json
from output_models/
to coco/
Please execute the following command to conduction task training:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc-per-node 8 train.py --cfg-path path-to-config
Note that we provide two example config files for task tuning under configs/ directory. You need to replace model/ckpt with the path to pretrained linear weights of first stage.
Run the demo by executing the following command. Replace 'path/to/pretrained_linear_weights' in the config file to the real path. We currently release linear weights based on Vicuna-13B-v1.1 and will release other weights later. The demo runs on 2 GPUs by default, one for the language model and another for GroundingDino.
CUDA_VISIBLE_DEVICES=0,1 python demo_detgpt.py --cfg-path configs/detgpt_eval_13b.yaml
The project is built on top of the amazing open-vocabulary detector GroundingDino and multimodal conversation model MiniGPT-4, which is based on BLIP2 and Lavis. Thanks for these great work!
If you're using DetGPT in your research or applications, please cite using this BibTeX:
@misc{pi2023detgpt,
title={DetGPT: Detect What You Need via Reasoning},
author={Renjie Pi and Jiahui Gao and Shizhe Diao and Rui Pan and Hanze Dong and Jipeng Zhang and Lewei Yao and Jianhua Han and Hang Xu and Lingpeng Kong and Tong Zhang},
year={2023},
eprint={2305.14167},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This repository is released under BSD 3-Clause License.