This is the code release for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning", implemented with Pytorch.
Title: MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning
Authors: Ruize Xu, Ruoxuan Feng, Shi-xiong Zhang, Di Hu
:rocket: Project page here: Project Page
:page_facing_up: Paper here: Paper
:mag: Supplementary material: Supplementary
Recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a bottleneck to enhancing the model`s performance. We further find that the up-to-date imbalance-mitigating methods fail on some audio-visual fine-grained tasks, which have a higher demand for distinguishable feature distribution. Fueled by the success of cosine loss that builds hyperspherical feature spaces and achieves lower intra-class angular variability, this paper proposes Multi-Modal Cosine loss, MMCosine. It performs a modality-wise $L_2$ normalization to features and weights towards balanced and better multi-modal fine-grained learning.
Download Original Dataset: CREMAD, SSW60, Voxceleb1&2, and UCF 101(supplementary).
Preprocessing:
/data
folder.You can train your model on the provided datasets (e.g. CREMAD) simply by running:
python main_CD.py --train --fusion_method gated --mmcosine True --scaling 10
Apart from fusion methods and scaling parameters, you can also adjust the setting such as batch_size
, lr_decay
, epochs
, etc.
You can also record intermediate variables through tensorboard by nominating use_tensorboard
and tensorboard_path
for saving logs.
If you find this work useful, please consider citing it.
@inproceedings{xu2023mmcosine,
title={MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning},
author={Xu, Ruize and Feng, Ruoxuan and Zhang, Shi-Xiong and Hu, Di},
booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
This research was supported by Public Computing Cloud, Renmin University of China.
If you have any detailed questions or suggestions, you can email us: xrz0315@ruc.edu.cn