Hello - Welcome! :beers:
This is the official repository for the paper Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning (ICML 2023).
Select and install the correct version of PyTorch and PyG, and also install CLIP and WILDS.
Example commands for installation on Linux with CUDA>=11.1:
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install pyg pytorch-scatter -c pyg
pip install ftfy regex
pip install git+https://github.com/openai/CLIP.git
pip install wilds
pip install opencv-python
Install Matplotlib and Plotly for visualization:
conda install -c conda-forge matplotlib
conda install -c plotly plotly=5.9.0
conda install -c conda-forge python-kaleido
(Optional) Install Wandb for visualization:
conda install -c conda-forge wandb
cd data
wget https://data.caltech.edu/records/w9d68-gec53/files/segmentations.tgz
tar -xf segmentations.tgz -C ./waterbirds
conda install -c huggingface -c conda-forge datasets
cd data
wget https://image-net.org/data/bboxes_annotations.tar.gz
mkdir ./imagenet/bboxes
tar -xf bboxes_annotations.tar.gz -C ./imagenet/bboxes
Download the synsets for attributes and synsets for objects from the Visual Genome dataset:
cd data
wget https://homes.cs.washington.edu/~ranjay/visualgenome/data/dataset/object_synsets.json.zip
wget https://homes.cs.washington.edu/~ranjay/visualgenome/data/dataset/attribute_synsets.json.zip
unzip object_synsets.json.zip
unzip attribute_synsets.json.zip
rm object_synsets.json.zip
rm attribute_synsets.json.zip
Launch the spurious detection for a range of classes with multiple GPUs:
python tools/detection_command_launchers.py --device 0 1 3 4 --class_rank_range 0 100 --output_dir detection_runs
python run_expt.py --dataset waterbirds --algorithm ERM --model clip-vit --root_dir data --device 0 --seed 11111111 --use_wandb --eval_only --eval_split test --eval_epoch -1
$L{lc}+L{vc}+L_{ls}$
python run_expt.py --dataset waterbirds --algorithm Multimodal --model clip-rn50 --root_dir data --device 0 --freeze_language --freeze_vision --train_projection --seed 11111111 --batch_size 128 --n_epochs 300 --class_weight 0 --clip_weight 1.0 --image_weight 1.0 --language_weight 1.0 --domain_weight 0.0 --spurious_weight 1.0 --spurious_class_weight 0.0 --spurious_clip_weight 0.0 --crossmodal_weight 0.0 --pos_weight 1.0 --neg_weight 1.0 --weight_decay 1e-5 --lr 1e-4 --use_wandb --download=True
$L{lc}+L{vc}+L_{vs}$
python run_expt.py --dataset waterbirds --algorithm Multimodal --model clip-rn50 --root_dir data --device 0 --freeze_language --freeze_vision --train_projection --seed 11111111 --batch_size 128 --n_epochs 300 --class_weight 0 --clip_weight 1.0 --image_weight 1.0 --language_weight 1.0 --domain_weight 0.0 --spurious_weight 0.0 --spurious_class_weight 1.0 --spurious_clip_weight 0.0 --crossmodal_weight 0.0 --pos_weight 1.0 --neg_weight 1.0 --weight_decay 1e-5 --lr 1e-4 --use_wandb --download=True
$L{lc}+L{vc}+L_{ls}$
python run_expt.py --dataset waterbirds --algorithm Multimodal --model clip-vit --root_dir data --device 0 --freeze_vision --freeze_language --train_projection --seed 11111111 --batch_size 32 --n_epochs 300 --class_weight 0.0 --clip_weight 1.0 --image_weight 1.0 --language_weight 1.0 --domain_weight 0.0 --spurious_weight 1.0 --spurious_class_weight 0.0 --spurious_clip_weight 0.0 --crossmodal_weight 0.0 --pos_weight 1.0 --neg_weight 1.0 --weight_decay 1e-5 --lr 1e-4 --use_wandb --download=True
$L{lc}+L{vc}+L_{vs}$
python run_expt.py --dataset waterbirds --algorithm Multimodal --model clip-vit --root_dir data --device 0 --freeze_vision --freeze_language --train_projection --seed 11111111 --batch_size 32 --n_epochs 300 --class_weight 0.0 --clip_weight 1.0 --image_weight 1.0 --language_weight 1.0 --domain_weight 0.0 --spurious_weight 0.0 --spurious_class_weight 1.0 --spurious_clip_weight 0.0 --crossmodal_weight 0.0 --pos_weight 1.0 --neg_weight 1.0 --weight_decay 1e-5 --lr 1e-4 --use_wandb --download=True
If you find this repository useful, please cite our paper:
@inproceedings{yang2023mitigating,
title={Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning},
author={Yang, Yu and Nushi, Besmira and Palangi, Hamid and Mirzasoleiman, Baharan},
booktitle={International Conference on Machine Learning},
year={2023}
}