MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.83k stars 530 forks source link

RhythmTransform descriptor QA #437

Open dbogdanov opened 8 years ago

dbogdanov commented 8 years ago

RhythmTranform algorithm computes rhythm descriptors from Mel bands. The original paper specifies the computation of Mel bands and it is implemented in the python example.

Testing this script with simple examples, there are detected periodicities due to octave errors which I am not sure should be there meanwhile sub-octave errors make sense.

An example with a 100 BPM metronome click (20th coefficient corresponds to 100 BPM): rhythmtransform_example

An example of a 130 BPM kick drum loop with what seems to be a correct behavior (26th coefficient): rhythmtransform_example_good

dbogdanov commented 8 years ago

Test if the same problem occurs when using:

dbogdanov commented 8 years ago

An example with a 50 BPM metronome click (10th coefficient corresponds to 50 BPM): click_50bpm

An example with a 200 BPM metronome click (40th coefficient corresponds to 200 BPM): click_200bpm

The 200 BPM signal is detected correctly.

dbogdanov commented 8 years ago

100 BPM (= 1.66666 Hz) sine wave (detected as 200 BPM) sin_100bpm

50 BPM sine wave (detected as 100 BPM) sin_50bpm

200 BPM sine wave (detected as 400 BPM) sin_200bpm

dbogdanov commented 7 years ago

@ffont can we have a simple evaluation using your dataset of loops? we can extract overall bpm from the rhythm transform by averaging rhythm transform coefficients over frames.

ffont commented 7 years ago

to analyze it in my framework I need a python function that given a sound dictionary with the file path inside, it computes the bpm and returns it in a dictionary like {'MethodName': {'bpm': 125}} (see https://github.com/ffont/ismir2016/blob/master/analysis_algorithms.py#L11)

The easiest is probably to create a branch in my repository and add the new function. Then I can simply run it locally with the data I have. Would that work?

dbogdanov commented 7 years ago

Sure. I'll send you a sketch for the analysis code.

dbogdanov commented 7 years ago

@ffont Here is the code to use. I've tested it on one music file and it estimated BPM correctly, but... no high expectations yet ;)

import sys
from essentia.standard import *
from essentia import Pool
import numpy

try:
    input_file = sys.argv[1]
except:
    print "usage:", sys.argv[0], "<input_file>"
    sys.exit()

"""
Explanation of Rhythm Transform: 
- Mel bands are computed on frames of the size 8192 with the frames sample rate = sampleRate/hopSize = 22050/1024 = 21.5Hz
- Rhythm transform frame size is equal to 256 Mel bands frames
- Output vector is of size 256/2 + 1 = 129.
- Therefore it represents periodicities over the interval 0Hz (0th bin) to 22050/1024/2 = 10.75Hz (129th bin),
- Converting to BPM values this corresponds to an interval from 0 BPM to 22050/1024/2 * 60 = 646 BPM
- Each bin roughly covers 5 BPM
- 60-200 BPM interval is covered by only 40-12 = 28 bins
- 120 BPM rougphly corresponds to bin #24
- bin 0 = 0 BPM
- bin 128 = 645.99609375 BPM
"""

sampleRate   = 22050
frameSize    = 8192
hopSize      = 1024
rmsFrameSize = 256
rmsHopSize   = 32

loader = MonoLoader(filename=input_file, sampleRate=sampleRate)
w = Windowing(type='blackmanharris62')
spectrum = Spectrum()
melbands = MelBands(sampleRate=sampleRate, numberBands=40, lowFrequencyBound=0, highFrequencyBound=sampleRate/2)

pool = Pool()

for frame in FrameGenerator(audio=loader(), frameSize=frameSize, hopSize=hopSize, startFromZero=True):
    bands = melbands(spectrum(w(frame)))
    pool.add('melbands', bands)

rhythmtransform = RhythmTransform(frameSize=rmsFrameSize, hopSize=rmsHopSize)
rt = rhythmtransform(pool['melbands'])
rt_mean = numpy.mean(rt, axis=0)
bin_resoluion = 5.007721656976744

print numpy.argmax(rt_mean) * bin_resoluion
ffont commented 7 years ago

Ups, the results do not seem to be very encouraging hehe: https://github.com/ffont/ismir2016/blob/rhythm_transform/Tempo%20estimation%20results.ipynb

dbogdanov commented 7 years ago

Yeah.. I am sad now :) lets give it another last chance. As you can notice from plots above there are often octave error. We can test if accuracy improves when dividing estimated BPM values by 2.

ffont commented 7 years ago

No need to do that as this is covered in accuracy2, which considers octave errors as correct predictions. See that RhythmTransform method duplicates its accuracy, but is still far from sate of tge art...

dbogdanov commented 7 years ago

Copy-pasting results here for a record.

General tempo estimation results (ALL DATASETS)
-----------------------------------------------

Method            Accuracy 1e   Accuracy 1   Accuracy 2   Mean accuracy   
--------------------------------------------------------------------------
Percival14        43.90         54.45        70.52        56.29           
Degara12          30.22         52.98        60.46        47.88           
Zapata14          28.54         52.47        60.35        47.12           
Bock15            22.74         37.03        65.04        41.61           
RhythmTransform   13.05         18.17        37.14        22.79