MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.83k stars 530 forks source link

windowSize parameter has no effect for ChordsDetection #769

Open jamiebullock opened 6 years ago

jamiebullock commented 6 years ago

I am using Essential version 2.1-beta3 via the Python API.

Either I am doing something wrong, or the windowSize parameter for ChordsDetection has no effect.

My code is something like:

pcp = []
for frame in FrameGenerator(audio, frameSize = 4096, hopSize = 2048):
    spec = Spectrum()(w(frame))
    specPeakF, specPeakM = SpectralPeaks()(spec)
    hpcp = HPCP()(specPeakF, specPeakM)
    pcp.append(hpcp)
chords, strengths = ChordsDetection( windowSize = 2 )(np.array(pcp))

Regardless of the value I set windowSize, the chords vector is always the same size, i.e.

len(pcp) == len(chords)

Evaluates true

dbogdanov commented 6 years ago

This is a correct behavior. We've updated the documentation to clarify this (see 8352aef).

jamiebullock commented 6 years ago

I don't believe the update clarifies the problem. Maybe you misunderstood my point. ChordsDetection takes a windowSize parameter in seconds

The documentation currently says:

windowSize (real ∈ (0, ∞), default = 2) : the size of the window on which to estimate the chords [s]

This implies that ChordsDetection averages the HPCP vector in windows of windowSize seconds and outputs the chord for that window.

Currently setting the windowSize has no effect AFAICT. Unless I'm misunderstanding the point of it...

dbogdanov commented 6 years ago

Then I guess we need to improve the documentation even more. The idea is that the input HPCP frames are averaged on windows of windowSize seconds. However, the output is done for time positions of every HPCP frame, therefore the number of estimated chords is equal to the number of HPCP frames.

There is some discussion whether this is a desired behavior in the code comments. We may want to review it again.

jamiebullock commented 6 years ago

Ah OK, got it. Yeah, based on the current doc...

If we have a 6 second sample at 44100 and hopsize of 2048, and windowSize=2 I would expect a vector of ~130 HPCP to result in 3 chord values output from ChordsDetection. These would be for 0..2s, 2..4s, 4..6s.

If I understand correctly, I would instead get 130 chord values, but only 3 different values corresponding to the 2 second windows respectively. Is that right?

dbogdanov commented 6 years ago

Correct, the window size only specifies the averaging scope. May be we should update the description of the windowSize parameter to reflect this better.

(PS. The new doc is not yet online.)