beantowel / chorus-from-music-structure

chorus detection for pop music
39 stars 6 forks source link
chorus-detection music-structure

Chorus Detection Using Music Structure Analysis

This repo implements a chorus detection algorithm. The algorithm detects chorus sections in pop music, and the output follows the MIREX (Music Information Retrieval Evaluation eXchange) Structural Segmentation format. A structural view of the detected choruses can be seen in the figure below, where the green stripes in subfigure "c" show the ground-truth and output chorus sections.

example on 'Dream Magic'

The code also evaluated some other algorithms:

Prerequisite

The algorithm was implemented in Python 3.7, the requirements were listed in requirements.txt, install them using:

pip install -r ./requirements.txt

The latest version of librosa which has some crucial fixes should be installed via Github:

git clone --depth=1 https://github.com/librosa/librosa.git
pip install -e librosa

Download the melody extraction algorithm JDC, and configure the path as described in Configuration:

git clone --depth=1 https://github.com/keums/melodyExtraction_JDC.git

Configuration

A few paths need to be set in configs/configs.py:

The pre-trained model were decided by the USING_DATASET variable in configs/modelConfigs.py. To use the model trained on RWC dataset, for example, set USING_DATASET=RWC_Popular_Dataset().

If you are only interested in using the algorithm, you can skip the following lines.

If you want to evaluate the algorithms, more configurations need to be done:

Usage

To detect the chorus sections of a music recording, use the predict.py:

Usage: predict.py [OPTIONS] [AUDIOFILES]...

Options:
  --outputdir PATH
  --metaOutputdir PATH
  --algo [multi|single]
  --force BOOLEAN        overwrite cached features.
  --help                 Show this message and exit.

A Quick example is

python predict.py ./data/example/starfall.mp3 --force false

By default, the algorithm outputs all the chorus sections detected, but you can use the option --algo single to force it outputs a single chorus section.

The default directory for mirex format output (OUTPUTDIR) is ./data/predict, the output file contains 3 columns:

<onset_time(sec)>\t<offset_time(sec)>\t<label>\n
<onset_time(sec)>\t<offset_time(sec)>\t<label>\n
...

The default directory for viewer metadata (METAOUTPUTDIR) is ./data/viewerMetadata, the JSON files were used for a simple HTML player which shows the result of chorus detection. To view the output and play the music, use the simple HTML page ./viewer/index.html to open the metadata file.

The metadata generated by the algorithm always links to a local audio file. For a quick example, however, you can open the file data/example/starfall_meta.json which has an online audio link in the viewer:

Audio player example

To evaluate the algorithms, calculate the features for audio files first, and train the classifier, then evaluate:

python feature.py build && python feature.py train && python eval_algos.py

Custom dataset

Besides the dataset RWC Pop and SALAMI provided in the code, you can add your own dataset for training and testing. For this purpose, you should add a custom dataset class in utility/dataset.py which would be a subclass of BaseStructDataset. The audio files and annotations should be set in the class variable self.pathPairs on initialization, whose type is a list of namedtuple StructDataPathPair. Then you need to implement the loadGT method in the custom class, loadGT accepts the path of the annotation file, and returns a MIREX format data, which is composed of segments' onset/offset times and its label. You can also optionally implement the method semanticLabelDic which accepts nothing and returns a dictionary that maps the label used in your dataset to specific numbers, it's used for generating labeled target Self-similarity Matrix, but this functionality was not used currently. However, the labels used for training is generated using a string-match method, all the labels from the dataset start with the substring "chorus" is considered as the target segments.

How it works

The overview of the algorithm is described in the image below. Firstly, acoustic features as pitch chroma, MFCC, chroma, and tempogram were calculated from the input music recording. Then self-similarity matrices were generated on these features and fused into one. Low-level patterns were extracted by graph algorithms assuming transitivity of similarity and merged to form top-level structures. In the end, a classifier learns from the training data to detect chorus sections and makes predictions on structural information and melody features of the input sections.

overview