Domain Generalization through Distilling CLIP with Language Guidance

This repo is the official implementation of our ICCV 2023 paper "A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance".

Getting Started

Data Preparation

Download PACS dataset from here
Download VLCS dataset from here
Download OfficeHome dataset from here
Download Terra dataset from here

The dataset is structured as follows:

dataset
├── PACS
│   ├── Domain1
│   ├── Domain2
│   └── Domain3
│   └── Domain4
├── VLCS
│   ├── ...
├── OfficeHome
│   ├── ...
└── Terra
    ├── ...

Install

Pytorch 1.7.1 (or later) from here
CLIP from here
Timm: pip install timm

Launch a sweep

python train_rise.py\
       --dataset "PACS" --seed 0 --output_folder "sweep1" --data_path "your datasets path"

The training record will be saved in the "results/output_folder".

# Train RISE with mix of teachers
CUDA_VISIBLE_DEVICES="0,1,..." python train_rise_mix_teacher.py\
       --dataset "PACS" --seed 0 --output_folder "sweep1" --data_path "your datasets path"

Training mix of teachers might need more than one GPU. Please adjust the GPU count as necessary.

View the results

python evaluate_results.py\
       --dataset "PACS" --output_folder "sweep1"

The model is selected by training-domain validation criteria.

Acknowledgments

The codebase is built upon OoD-Bench, JigenDG and DomainBed.

WisconsinAIVision / RISE

readme