MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.81k stars 525 forks source link

WindowLength of 2048 in PredominantPitchMelodia #208

Open steveOram1 opened 9 years ago

steveOram1 commented 9 years ago

Hi,

One general doubt. How did you decide to use a window with length 2048 ? By using a window length of 2048, the closest sinusoids that can be extracted from a frame should be separated by around 80Hz for a sampling frequency of 44100Hz. This minimum separation would not have been enough according to my understanding. I am not sure whether I have missed any point here. Could you please throw some light onto this ?

dbogdanov commented 9 years ago

Are you referring to computation of a particular descriptor? Where does the number 80Hz come from? Using 2048 frame size with 44kHz samplerate you'll get ~10.8 Hz fft bin width.

steveOram1 commented 9 years ago

First of all thanks for your time. I was not referring to the bin width. This algorithm has a sinusoidal extraction module where the possible frequencies that could be part of the lead melody are identified. For two sinusoids, present in a frame, to be properly identified, we need a minimum window length that in turn depends on the difference between the two sinusoids actually present in that frame. For a Hann window with 2048 length, the minimum frequency separation for two sinusoids to be identified is around 86Hz which is calculated as [ 4(length of the main lobe in bins)*44100(fs)/2048(window length)] which comes to around 86 Hz. The actual separation between the sinusoids present in a frame could be lesser than 86 Hz. This is where I have a confusion. I hope you got my point.