QingyangZhang / QMF

Quality-aware multimodal fusion on ICML 2023
MIT License
55 stars 2 forks source link

Provable Dynamic Fusion for Low-Quality Multimodal Data

This is the official implementation for Provable Dynamic Fusion for Low-Quality Multimodal Data (ICML 2023) by Qingyang Zhang, Haitao Wu, Changqing Zhang , Qinghua Hu, Huazhu Fu, Joey Tianyi Zhou and Xi Peng

Enviroment setup

pip install -r requirements.txt

Dataset preparation

Feel free to use Baidu Netdisk for food101 MVSA_Single NYUD2 SUNRGBD.

Trained Model

We provide the trained models at Baidu Netdisk.

Pretrained bert model at Baidu Netdisk.

We use the pytorch official pretrained resnet18 in RGB-D classification tasks, which can be downloaded from this link.

Usage Example: Text-Image Classification

Note: Sheels for reference are provided in the folder shells

To run our method on benchmark datasets:

To run tmc:

python train_tmc.py --batch_sz 16 --gradient_accumulation_steps 40  \
    --savedir ./saved/$task --name $name  --data_path ./datasets/ \
    --task $task --task_type $task_type  --model $model --num_image_embeds 3 \
    --freeze_txt 5 --freeze_img 3   --patience 5 --dropout 0.1 --lr 5e-05 --warmup 0.1 --max_epochs 100 --seed $i --df true --noise 0.0

To run Others:

Citation

If our QMF or the idea of dynamic multimodal fusion methods are helpful in your research, please consider citing our paper:

@inproceedings{zhang2023provable,
  title={Provable Dynamic Fusion for Low-Quality Multimodal Data},
  author={Zhang, Qingyang and Wu, Haitao and Zhang, Changqing and Hu, Qinghua and Fu, Huazhu and Zhou, Joey Tianyi and Peng, Xi},
  booktitle={International Conference on Machine Learning},
  year={2023}
}

Acknowledgement

The code is inspired by TMC: Trusted Multi-View Classification and Confidence-Aware Learning for Deep Neural Networks.

Related works

There are many interesting works related to this paper:

For any additional questions, feel free to email qingyangzhang@tju.edu.cn.