This is the official implementation of "QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding". In this paper, we propose a novel query-based one-stage framework for weakly supervised visual grounding, namely QueryMatch, Different from previous work, QueryMatch represents candidate objects with a set of query features, which inherently establish accurate one-to-one associations with visual objects. In this case, QueryMatch re-formulates weakly supervised visual grounding as a query-text matching problem, which can be optimized via the query-based contrastive learning. Based on QueryMatch we further propose an innovative strategy for effective weakly supervised learning, namely Active Query Selection (AQS). In particular, AQS aims to enhance the effectiveness of query-based contrastive learning by actively selecting high-quality query features.
Clone this repo
git clone https://github.com/TensorThinker/QueryMatch.git
cd QueryMatch
Create a conda virtual environment and activate it
conda create -n querymatch python=3.8 -y
conda activate querymatch
Install Pytorch following the official installation instructions
Install detectron following the official installation instructions
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
cd utils_querymatch/DCN
./make.sh
cd mask2former
pip install -r requirements.txt
cd ./modeling/pixel_decoder/ops
sh make.sh
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz
pip install albumentations
pip install Pillow==9.5.0
pip install tensorboardX
| -- QueryMatch
| -- data
| -- anns
| -- refcoco.json
| -- refcoco+.json
| -- refcocog.json
| -- images
| -- train2014
| -- COCO_train2014_000000000072.jpg
| -- ...
| -- config_querymatch
| -- configs
| -- datasets
| -- datasets_querymatch
| -- DCNv2_latest
| -- detectron2
| -- mask2former
| -- models_querymatch
| -- ...
python train_querymatch.py --config ./config_querymatch/[DATASET_NAME].yaml --config-file ./configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml --eval-only MODEL.WEIGHTS [PATH_TO_MASK2FORMER_WEIGHT]
python test_querymatch.py --config ./config_querymatch/[DATASET_NAME].yaml --eval-weights [PATH_TO_CHECKPOINT_FILE] --config-file ./configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml --eval-only MODEL.WEIGHTS [PATH_TO_MASK2FORMER_WEIGHT]
Method | RefCOCO | RefCOCO+ | RefCOCOg | ||||
---|---|---|---|---|---|---|---|
val | testA | testB | val | testA | testB | val-g | |
QueryMatch | 59.10 | 59.08 | 58.82 | 39.87 | 41.44 | 37.22 | 43.06 |
Method | RefCOCO | RefCOCO+ | RefCOCOg | ||||
---|---|---|---|---|---|---|---|
val | testA | testB | val | testA | testB | val-g | |
QueryMatch | 66.02 | 66.00 | 65.48 | 44.76 | 46.72 | 41.50 | 48.47 |
This project is compatible with multiple CUDA versions, including but not limited to CUDA 11.3. While the relative performance trends remain consistent across different hardware environments, please note that the specific numerical results may vary slightly.
Thanks a lot for the nicely organized code from the following repos