MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.85k stars 533 forks source link

PitchYinFFT often returns a pitch reading which is 1/2 or 1/3 of the actual frequency #893

Open leonid-s-usov opened 5 years ago

leonid-s-usov commented 5 years ago

In the attached images the red pitch reading top right (with the confidence in blue below) and the spectrum of the first 1/8 of the frequencies (up to ~3kHz) are taken with the following algo chains. frameSize is 16384, sample rate 48000

What could be the reason for the Pitch to be reported incorrectly?

    Algorithm* frameCutter  = factory.create("FrameCutter",
                                             "frameSize", framesize,
                                             "hopSize", hopsize,
                                             "silentFrames", "keep",
                                             "startFromZero", true,
                                             "validFrameThresholdRatio", 1.
                                             );

    Algorithm* window       = factory.create("Windowing",
                                             "type", "blackmanharris62",
                                             "size", framesize);

    Algorithm* spectrum     = factory.create("Spectrum");

    Algorithm* pitchDetect  = factory.create("PitchYinFFT",
                                             "frameSize", framesize,
                                             "sampleRate", sr);

    Algorithm* lp = factory.create("HighPass", "sampleRate", sr, "cutoffFrequency", 50);

    audio->output("signal")                 >>  lp->input("signal");
    lp->output("signal")                    >>  frameCutter->input("signal");
    frameCutter->output("frame")            >>  window->input("frame");
    window->output("frame")                 >>  spectrum->input("frame");
    spectrum->output("spectrum")            >>  pitchDetect->input("spectrum");
    Algorithm* frameCutter  = factory.create("FrameCutter",
                                             "frameSize", framesize,
                                             "hopSize", hopsize,
                                             "silentFrames", "noise"
                                             //, "startFromZero", true,
                                             //, "validFrameThresholdRatio", 1.
                                             );
    Algorithm* window       = factory.create("Windowing",
                                             "type", "blackmanharris62",
                                             "size", framesize);
    Algorithm* spec  = factory.create("Spectrum"
                                     //,"sampleRate", sr
                                      );

    Algorithm* lp = factory.create("HighPass", "sampleRate", sr, "cutoffFrequency", 50);

    audio->output("signal")                 >>  lp->input("signal");
    lp->output("signal")                 >>  frameCutter->input("signal");
    frameCutter->output("frame")            >>  window->input("frame");
    window->output("frame") >> spec->input("frame");

apologise for the image quality

IMG_0135 IMG_0134 IMG_0133 IMG_0132

leonid-s-usov commented 5 years ago

In this paper describing the original Yin algorithm, step 4 refers to this problem exactly. The authors propose introducing the threshold which should reduce the number of "too low" errors.

In PitchYinFFT I don't see any kind of threshold or "tolerance" parameter.

Running PitchYin I can see that these errors still occur, but at the same time, this vanilla version of the algo is much heavier on the CPU.

dbogdanov commented 5 years ago

How often does this happen? Perhaps you should play with the threshold parameter in PitchYin to improve. There's not similar parameter for PitchYinFFT, but indeed it make sense to add it. You can patch PitchYinFFT to add it and see how it works (of course, a pull request is welcome).

piem commented 5 years ago

hi @leonid-s-usov

I'm the author of yinfft, you can read about it in chapter 3 of this document. IIRC, the version I wrote for essentia at the time was simply taking the minimum the entire vector, but the version in aubio does use thresholding. yinfft isn't bad at picking the correct frequency in transients, but suffers from a few drawbacks as you've experienced.

If cpu usage is a concern, have a look at yinfast in aubio.

best, piem

leonid-s-usov commented 5 years ago

@piem, thanks for the link! Great work I will definitely review this chapter and the document in general.

@dbogdanov As subjective as it sounds, it happens too often to be ignored IMO. PitchYin is not an option as it uses 30% of CPU compared to 10% of PitchYinFFT. Since the original paper about the Yin algorithm explicitly addresses the issue of subharmonic and given @piem comment above, I have high hopes to the potential introduction of the threshold to the PitchYinFFT

leonid-s-usov commented 5 years ago

In addition, I've noticed that when playing a chord the pitch detection shows some low subharmonic in 100% of cases. Not sure whether this is expected theoretically, but I'm concerned since Yin algorithm is positioned as a "the fundamental frequency (F0) of speech or musical sounds.", which in my mind should somehow deal with an expected difference frequency in a chord.

dbogdanov commented 5 years ago

Yin is designed to work with monophonic signals.