GeWu-Lab / MMCosine_ICASSP23

The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"
18 stars 1 forks source link

MMCosine_ICASSP23

This is the code release for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning", implemented with Pytorch.

Title: MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

Authors: Ruize Xu, Ruoxuan Feng, Shi-xiong Zhang, Di Hu

:rocket: Project page here: Project Page

:page_facing_up: Paper here: Paper

:mag: Supplementary material: Supplementary

Overview

Recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a bottleneck to enhancing the model`s performance. We further find that the up-to-date imbalance-mitigating methods fail on some audio-visual fine-grained tasks, which have a higher demand for distinguishable feature distribution. Fueled by the success of cosine loss that builds hyperspherical feature spaces and achieves lower intra-class angular variability, this paper proposes Multi-Modal Cosine loss, MMCosine. It performs a modality-wise $L_2$ normalization to features and weights towards balanced and better multi-modal fine-grained learning.

Data Preparation

Main Dependencies

Run

You can train your model on the provided datasets (e.g. CREMAD) simply by running:

python main_CD.py --train --fusion_method gated --mmcosine True --scaling 10

Apart from fusion methods and scaling parameters, you can also adjust the setting such as batch_size, lr_decay, epochs, etc.

You can also record intermediate variables through tensorboard by nominating use_tensorboard and tensorboard_path for saving logs.

Bibtex

If you find this work useful, please consider citing it.

@inproceedings{xu2023mmcosine,
  title={MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning},
  author={Xu, Ruize and Feng, Ruoxuan and Zhang, Shi-Xiong and Hu, Di},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Acknowledgement

This research was supported by Public Computing Cloud, Renmin University of China.

Contact us

If you have any detailed questions or suggestions, you can email us: xrz0315@ruc.edu.cn