RamiMatar / Chroma-BSRNN

11 stars 1 forks source link

Music Source Separation with Harmonic Awareness using Band Split RNN

This project builds a music source separation model based on the BSRNN architecture proposed by Y. Luo, et. al. (2022), which achieves state of the art results on MusDB on most SDR metrics. This project aims to improve the BSRNN architecture by using a high frequency-resolution spectrogram to calculate an octave-aware chromagram which is used to create harmonic attention information to augment the main spectrogram RNN path proposed in BSRNN. Pretrained models and paper will be posted soon.

Primary References

  1. Luo, Yi, and Jianwei Yu. "Music source separation with band-split rnn." IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
  2. Luo, Yi, Zhuo Chen, and Takuya Yoshioka. "Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.

Usage - Training and Evaluation

First, make sure to activate the conda env

conda env create -f environment.yml
conda activate mss

Make sure to download the model checkpoints and dataset (MusDB18HQ)

Consider using the following notebooks : SourceSepEval.ipynb will let you test any of the three models against the musdb18hq test dataset. SourceSepDemo.ipynb will let you test any of the models for any song you want and visualize the output spectrograms.

To train the model:

 python trainer.py --chroma_version attention --batch_size 1

This is what you can specify, DDP will be used automatically if multiple GPUs are available.

Evaluation

To evaluate the separated sources, you can use the eval.py script. Follow these steps:

  1. Make sure you have the trained model checkpoint available.

  2. Run the evaluation script:

python eval.py --model bsrnn --model_path path/to/model/checkpoint --song_path mixture.wav --source_path vocals.wav --offset 0.0 --length 30 --eval False --full_eval_mode False --plot_spectrograms False --force_mono True