MCG-NJU / JoMoLD

[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
26 stars 3 forks source link
audio-visual

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Haoyue Cheng, Zhaoyang Liu, Hang Zhou, Chen Qian, Wayne Wu and Limin Wang

Code for ECCV 2022 paper Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Paper Overview

Modality-specific label noise

The procedure of modality-specific label denoising

The results on LLP dataset

Get Started

Prepare data

  1. Please download the preprocessed audio and visual features from https://github.com/YapengTian/AVVP-ECCV20.
  2. Put the downloaded features into data/feats/.

Train the model

1.Train noise estimator:

python main.py --mode train_noise_estimator --save_model true --model_save_dir ckpt --checkpoint noise_estimater.pt

2.Calculate noise ratios:

python main.py --mode calculate_noise_ratio --model_save_dir ckpt --checkpoint noise_estimater.pt --noise_ratio_file noise_ratios.npz

3.Train model with label denoising:

python main.py --mode train_label_denoising --save_model true --model_save_dir ckpt --checkpoint JoMoLD.pt --noise_ratio_file noise_ratios.npz

Test

We provide the pre-trained JoMoLD checkpoint for evaluation. Please download and put the checkpoint into "./ckpt" directory and use the following command to test:

python main.py --mode test_JoMoLD --model_save_dir ckpt --checkpoint JoMoLD.pt

Citation

If you find this work useful, please consider citing it.

@article{cheng2022joint,
  title={Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing},
  author={Cheng, Haoyue and Liu, Zhaoyang and Zhou, Hang and Qian, Chen and Wu, Wayne and Wang, Limin},
  journal={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2022}
}