This project builds a music source separation model based on the BSRNN architecture proposed by Y. Luo, et. al. (2022), which achieves state of the art results on MusDB on most SDR metrics. This project aims to improve the BSRNN architecture by using a high frequency-resolution spectrogram to calculate an octave-aware chromagram which is used to create harmonic attention information to augment the main spectrogram RNN path proposed in BSRNN. Pretrained models and paper will be posted soon.
First, make sure to activate the conda env
conda env create -f environment.yml
conda activate mss
Make sure to download the model checkpoints and dataset (MusDB18HQ)
Consider using the following notebooks : SourceSepEval.ipynb will let you test any of the three models against the musdb18hq test dataset. SourceSepDemo.ipynb will let you test any of the models for any song you want and visualize the output spectrograms.
To train the model:
python trainer.py --chroma_version attention --batch_size 1
This is what you can specify, DDP will be used automatically if multiple GPUs are available.
--epochs
: Total number of epochs to train the model (default: 10).--seed
: Seed for pseudorandom generation (default: 42).--writer_dir
: Directory for TensorBoard logs (default: tb_logs).--load_model
: Path to a checkpoint to load the model from.--save_model
: Path to save checkpoints (default: checkpoint.pt).--batch_size
: Input batch size on each device (default: 4).--num_workers
: Number of worker processes for each dataloader (default: 0).--validation_per_n_epoch
: Number of training epochs before doing a validation dataset iteration (default: 5).--epoch_size
: Number of random segments to sample from the dataset for each epoch (default: 1000).--chroma_version
: Chroma version to use, options are 'attention', 'fc_group', or other (default: attention).To evaluate the separated sources, you can use the eval.py
script. Follow these steps:
Make sure you have the trained model checkpoint available.
Run the evaluation script:
python eval.py --model bsrnn --model_path path/to/model/checkpoint --song_path mixture.wav --source_path vocals.wav --offset 0.0 --length 30 --eval False --full_eval_mode False --plot_spectrograms False --force_mono True
--model
: Model type to use, options are 'bsrnn', 'sa', or 'fc' (default: bsrnn).--model_path
: Path to the model checkpoint.--song_path
: Path to the song to separate.--source_path
: Path to the source to evaluate against.--offset
: Offset in seconds to start separating from (default: 0.0).--length
: Length in seconds to separate (default: 30).--eval
: Whether to evaluate the separated source (default: False).--full_eval_mode
: Whether to evaluate the separated source using museval (default: False).--plot_spectrograms
: Whether to plot the spectrograms of the mixture, source, and estimate (default: False).--force_mono
: Whether to force the input to be mono (default: True).