[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Byeonghyun Pak*, Byeongju Woo*, Sunghwan Kim*, Dae-hwan Kim, Hoseong Kim†\ Agency for Defense Development\ ECCV 2024

[`Project Page`] [`Paper`]

Environment

Requirements

The requirements can be installed with:

conda create -n tqdm python=3.9 numpy=1.26.4
conda activate tqdm
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install xformers==0.0.20
pip install mmcv-full==1.5.3

Pre-trained VLM Models

Please download the pre-trained CLIP and EVA02-CLIP and save them in ./pretrained folder.

Model Type Link

CLIP ViT-B-16.pt official repo

EVA02-CLIP EVA02_CLIP_L_336_psz14_s6B official repo

Model	Type	Link
CLIP	`ViT-B-16.pt`	official repo
EVA02-CLIP	`EVA02_CLIP_L_336_psz14_s6B`	official repo

Checkpoints

You can download tqdm model checkpoints:

Model	Pretrained	Trained on	Config	Link
`tqdm-clip-vit-b-gta`	CLIP	GTA5	config	download link
`tqdm-eva02-clip-vit-l-gta`	EVA02-CLIP	GTA5	config	download link
`tqdm-eva02-clip-vit-l-city`	EVA02-CLIP	Cityscapes	config	download link

Datasets

To set up datasets, please follow the official TLDR repo.
After downloading the datasets, edit the data folder root in the dataset config files following your environment.
```
src_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
tgt_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
```
Train
```
bash dist_train.sh configs/[TRAIN_CONFIG] [NUM_GPUs]
```
- [TRAIN_CONFIG]: train configuration (e.g., tqdm/tqdm_eve_vit-l_1e-5_20k-g2c-512.py)
- [NUM_GPUs]: the number of the GPUs
  Test
```
bash dist_test.sh configs/[TEST_CONFIG] work_dirs/[MODEL] [NUM_GPUs] --eval mIoU
```
- [TRAIN_CONFIG]: test configuration (e.g., tqdm/tqdm_eve_vit-l_1e-5_20k-g2b-512.py)
- [MODEL]: model checkpoint (e.g., tqdm_eve_vit-l_1e-5_20k-g2c-512/epoch_last.pth)
- [NUM_GPUs]: the number of the GPUs

The Most Relevant Files

configs/tqdm/* - Config files for the final tqdm
models/segmentors/* - Overall tqdm framework
mmseg/models/utils/assigner.py - Implementation of fixed matching
mmseg/models/decode_heads/tqdm_head.py - Our textual object query-based segmentation head
mmseg/models/plugins/tqdm_msdeformattn_pixel_decoder.py - Our pixel decoder with text-to-pixel attention

Citation

If you find our code helpful, please cite our paper:

@article{pak2024textual,
  title     = {Textual Query-Driven Mask Transformer for Domain Generalized Segmentation},
  author    = {Pak, Byeonghyun and Woo, Byeongju and Kim, Sunghwan and Kim, Dae-hwan and Kim, Hoseong},
  journal   = {arXiv preprint arXiv:2407.09033},
  year      = {2024}
}

Acknowledgements

This project is based on the following open-source projects. We thank the authors for sharing their codes.

ByeongHyunPak / tqdm

readme

[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

[`Project Page`] [`Paper`]

Environment

Requirements

Pre-trained VLM Models

Checkpoints

Datasets

Train

Test

The Most Relevant Files

Citation

Acknowledgements

ByeongHyunPak / tqdm

readme

[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

[Project Page] [Paper]

Environment

Requirements

Pre-trained VLM Models

Checkpoints

Datasets

Train

Test

The Most Relevant Files

Citation

Acknowledgements

[`Project Page`] [`Paper`]