This repository is an official PyTorch implementation of the ECCV 2024 paper SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding. Our SegVG transfers the box-level annotation as Segmentation signals to provide an additional pixel-level supervision for Visual Grounding. Additionally, the query, text, and vision tokens are triangularly updated to mitigate domain discrepancy by our proposed Triple Alignment module. Please cite our paper if the paper or codebase is helpful to you.
@article{kang2024segvg,
title={Segvg: Transferring object bounding box to segmentation for visual grounding},
author={Kang, Weitai and Liu, Gaowen and Shah, Mubarak and Yan, Yan},
journal={arXiv preprint arXiv:2407.03200},
year={2024}}
Clone this repository.
git clone https://github.com/WeitaiKang/SegVG.git
Prepare for environment.
Please refer to ReSC
for setting up environment. We use the 1.12.1+cu116 version pytorch.
Prepare for data.
Please download the coco train2014 images
.
Please download the referring expression annotations from the 'annotation' directory of SegVG
.
Please download the ResNet101
ckpts of vision backbone from TransVG.
You can place them wherever you want. Just remember to set the paths right in your train.sh and test.sh.
Our model ckpts are available in the 'ckpt' directory of SegVG
.
Model | val | testA | testB |
---|---|---|---|
SegVG | 86.84 | 89.46 | 83.07 |
Model | val | testA | testB |
---|---|---|---|
SegVG | 77.18 | 82.63 | 67.59 |
Model | val-g | val-u | test-u |
---|---|---|---|
SegVG | 76.01 | 78.35 | 77.42 |
Model | test |
---|---|
SegVG | 75.59 |
Training
bash train.sh
Please take a look of train.sh
to set the parameters.
Evaluation
bash test.sh
Please take a look of test.sh
to set the parameters.
This codebase is partially based on TransVG
.