ZFTurbo / MVSEP-CDX23-Cinematic-Sound-Demixing

Model for CDX23 (Cinematic Sound Demixing) contest
38 stars 5 forks source link

MVSEP-CDX23-Cinematic-Sound-Demixing

Model for Sound demixing challenge 2023: Cinematic Sound Demixing Track - CDX'23. Model performs separation of music into 3 stems "dialog (speech)", "effect (sfx)", "music". Model was trained on DnR dataset. It based on Demucs4. Test set in CDX23 contest was very different from DnR train data. So we released best models with best metrics for DnR test set.

Usage

    python inference.py --input_audio mixture1.wav mixture2.wav --output_folder ./results/

With this command audios with names "mixture1.wav" and "mixture2.wav" will be processed and results will be stored in ./results/ folder in WAV format.

Quality metrics

Quality were measured on DnR test set

Algorithm SDR dialog SDR effect SDR music SDR mean
Demucs HT 4 (single model) 14.18 7.92 6.75 9.62
Demucs HT 4 (3 checkpoints ensemble) 14.68 8.48 7.30 10.16

Citation

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
@article{f2024sound,
    title={The Sound Demixing Challenge 2023 – Cinematic Demixing Track},
    author={Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji},
    volume={7},
    number={1},
    pages={44--62},
    year={2024}
}