donghao51 / SimMMDG

[NeurIPS 2023] SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
49 stars 1 forks source link

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

Hao Dong1Ismail Nejjar2Han Sun2Eleni Chatzi1Olga Fink2
1ETH Zurich, 2EPFL

NeurIPS 2023

---

Overview of SimMMDG. We split the features of each modality into modality-specific and modality-shared parts. For the modality-shared part, we use supervised contrastive learning to map the features with the same label to be as close as possible. For modality-specific features, we use a distance loss to encourage them to be far from modality-shared features, promoting diversity within each modality. Additionally, we introduce a cross-modal translation module that regularizes features and enhances generalization across missing modalities.

Code

The code was tested using Python 3.10.4, torch 1.11.0+cu113 and NVIDIA GeForce RTX 3090.

Environments:

mmcv-full 1.2.7
mmaction2 0.13.0

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

  1. Download Audio model link, rename it as vggsound_avgpool.pth.tar and place under the EPIC-rgb-flow-audio/pretrained_models directory

  2. Download SlowFast model for RGB modality link and place under the EPIC-rgb-flow-audio/pretrained_models directory

  3. Download SlowOnly model for Flow modality link and place under the EPIC-rgb-flow-audio/pretrained_models directory

Download EPIC-Kitchens Dataset

bash download_script.sh 

Download Audio files EPIC-KITCHENS-audio.zip.

Unzip all files and the directory structure should be modified to match:

Click for details... ``` ├── MM-SADA_Domain_Adaptation_Splits ├── rgb | ├── train | | ├── D1 | | | ├── P08_01.wav | | | ├── P08_01 | | | | ├── frame_0000000000.jpg | | | | ├── ... | | | ├── P08_02.wav | | | ├── P08_02 | | | ├── ... | | ├── D2 | | ├── D3 | ├── test | | ├── D1 | | ├── D2 | | ├── D3 ├── flow | ├── train | | ├── D1 | | | ├── P08_01 | | | | ├── u | | | | | ├── frame_0000000000.jpg | | | | | ├── ... | | | | ├── v | | | ├── P08_02 | | | ├── ... | | ├── D2 | | ├── D3 | ├── test | | ├── D1 | | ├── D2 | | ├── D3 ```

Video and Audio

Click for details... ``` cd EPIC-rgb-flow-audio ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 25 --datapath /path/to/EPIC-KITCHENS/ ```

Video and Flow

Click for details... ``` cd EPIC-rgb-flow-audio ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ ```

Flow and Audio

Click for details... ``` cd EPIC-rgb-flow-audio ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_flow --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_flow --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_flow --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/ ```

Video and Flow and Audio

Click for details... ``` cd EPIC-rgb-flow-audio ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 10 --trans_hidden_num 1024 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/ ``` ``` python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --alpha_trans 1.0 --datapath /path/to/EPIC-KITCHENS/ ```

HAC Dataset

This dataset can be downloaded at link.

Unzip all files and the directory structure should be modified to match:

Click for details... ``` HAC ├── human | ├── videos | | ├── ... | ├── flow | | ├── ... | ├── audio | | ├── ... ├── animal | ├── videos | | ├── ... | ├── flow | | ├── ... | ├── audio | | ├── ... ├── cartoon | ├── videos | | ├── ... | ├── flow | | ├── ... | ├── audio | | ├── ... ```

Download the pretrained weights similar to EPIC-Kitchens Dataset and put under the HAC-rgb-flow-audio/pretrained_models directory.

Video and Audio

Click for details... ``` cd HAC-rgb-flow-audio ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/ ```

Video and Flow

Click for details... ``` cd HAC-rgb-flow-audio ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/ ```

Flow and Audio

Click for details... ``` cd HAC-rgb-flow-audio ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_flow --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_flow --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_flow --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/ ```

Video and Flow and Audio

Click for details... ``` cd HAC-rgb-flow-audio ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/ ``` ``` python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ ```

Contact

If you have any questions, please send an email to donghaospurs@gmail.com

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{dong2023SimMMDG,
    title={Sim{MMDG}: A Simple and Effective Framework for Multi-modal Domain Generalization},
    author={Dong, Hao and Nejjar, Ismail and Sun, Han and Chatzi, Eleni and Fink, Olga},
    booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
    year={2023}
}

Related Projects

MOOSA: Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Acknowledgement

Many thanks to the excellent open-source projects DomainAdaptation.