Open dbogdanov opened 8 years ago
Test if the same problem occurs when using:
An example with a 50 BPM metronome click (10th coefficient corresponds to 50 BPM):
An example with a 200 BPM metronome click (40th coefficient corresponds to 200 BPM):
The 200 BPM signal is detected correctly.
100 BPM (= 1.66666 Hz) sine wave (detected as 200 BPM)
50 BPM sine wave (detected as 100 BPM)
200 BPM sine wave (detected as 400 BPM)
@ffont can we have a simple evaluation using your dataset of loops? we can extract overall bpm from the rhythm transform by averaging rhythm transform coefficients over frames.
to analyze it in my framework I need a python function that given a sound
dictionary with the file path inside, it computes the bpm and returns it in a dictionary like {'MethodName': {'bpm': 125}}
(see https://github.com/ffont/ismir2016/blob/master/analysis_algorithms.py#L11)
The easiest is probably to create a branch in my repository and add the new function. Then I can simply run it locally with the data I have. Would that work?
Sure. I'll send you a sketch for the analysis code.
@ffont Here is the code to use. I've tested it on one music file and it estimated BPM correctly, but... no high expectations yet ;)
import sys
from essentia.standard import *
from essentia import Pool
import numpy
try:
input_file = sys.argv[1]
except:
print "usage:", sys.argv[0], "<input_file>"
sys.exit()
"""
Explanation of Rhythm Transform:
- Mel bands are computed on frames of the size 8192 with the frames sample rate = sampleRate/hopSize = 22050/1024 = 21.5Hz
- Rhythm transform frame size is equal to 256 Mel bands frames
- Output vector is of size 256/2 + 1 = 129.
- Therefore it represents periodicities over the interval 0Hz (0th bin) to 22050/1024/2 = 10.75Hz (129th bin),
- Converting to BPM values this corresponds to an interval from 0 BPM to 22050/1024/2 * 60 = 646 BPM
- Each bin roughly covers 5 BPM
- 60-200 BPM interval is covered by only 40-12 = 28 bins
- 120 BPM rougphly corresponds to bin #24
- bin 0 = 0 BPM
- bin 128 = 645.99609375 BPM
"""
sampleRate = 22050
frameSize = 8192
hopSize = 1024
rmsFrameSize = 256
rmsHopSize = 32
loader = MonoLoader(filename=input_file, sampleRate=sampleRate)
w = Windowing(type='blackmanharris62')
spectrum = Spectrum()
melbands = MelBands(sampleRate=sampleRate, numberBands=40, lowFrequencyBound=0, highFrequencyBound=sampleRate/2)
pool = Pool()
for frame in FrameGenerator(audio=loader(), frameSize=frameSize, hopSize=hopSize, startFromZero=True):
bands = melbands(spectrum(w(frame)))
pool.add('melbands', bands)
rhythmtransform = RhythmTransform(frameSize=rmsFrameSize, hopSize=rmsHopSize)
rt = rhythmtransform(pool['melbands'])
rt_mean = numpy.mean(rt, axis=0)
bin_resoluion = 5.007721656976744
print numpy.argmax(rt_mean) * bin_resoluion
Ups, the results do not seem to be very encouraging hehe: https://github.com/ffont/ismir2016/blob/rhythm_transform/Tempo%20estimation%20results.ipynb
Yeah.. I am sad now :) lets give it another last chance. As you can notice from plots above there are often octave error. We can test if accuracy improves when dividing estimated BPM values by 2.
No need to do that as this is covered in accuracy2, which considers octave errors as correct predictions. See that RhythmTransform method duplicates its accuracy, but is still far from sate of tge art...
Copy-pasting results here for a record.
General tempo estimation results (ALL DATASETS)
-----------------------------------------------
Method Accuracy 1e Accuracy 1 Accuracy 2 Mean accuracy
--------------------------------------------------------------------------
Percival14 43.90 54.45 70.52 56.29
Degara12 30.22 52.98 60.46 47.88
Zapata14 28.54 52.47 60.35 47.12
Bock15 22.74 37.03 65.04 41.61
RhythmTransform 13.05 18.17 37.14 22.79
RhythmTranform algorithm computes rhythm descriptors from Mel bands. The original paper specifies the computation of Mel bands and it is implemented in the python example.
Testing this script with simple examples, there are detected periodicities due to octave errors which I am not sure should be there meanwhile sub-octave errors make sense.
An example with a 100 BPM metronome click (20th coefficient corresponds to 100 BPM):
An example of a 130 BPM kick drum loop with what seems to be a correct behavior (26th coefficient):