Open leonid-s-usov opened 5 years ago
In this paper describing the original Yin algorithm, step 4 refers to this problem exactly. The authors propose introducing the threshold which should reduce the number of "too low" errors.
In PitchYinFFT
I don't see any kind of threshold or "tolerance" parameter.
Running PitchYin
I can see that these errors still occur, but at the same time, this vanilla version of the algo is much heavier on the CPU.
How often does this happen? Perhaps you should play with the threshold
parameter in PitchYin
to improve. There's not similar parameter for PitchYinFFT, but indeed it make sense to add it. You can patch PitchYinFFT to add it and see how it works (of course, a pull request is welcome).
hi @leonid-s-usov
I'm the author of yinfft
, you can read about it in chapter 3 of this document. IIRC, the version I wrote for essentia at the time was simply taking the minimum the entire vector, but the version in aubio does use thresholding. yinfft
isn't bad at picking the correct frequency in transients, but suffers from a few drawbacks as you've experienced.
If cpu usage is a concern, have a look at yinfast
in aubio.
best, piem
@piem, thanks for the link! Great work I will definitely review this chapter and the document in general.
@dbogdanov As subjective as it sounds, it happens too often to be ignored IMO. PitchYin is not an option as it uses 30% of CPU compared to 10% of PitchYinFFT. Since the original paper about the Yin algorithm explicitly addresses the issue of subharmonic and given @piem comment above, I have high hopes to the potential introduction of the threshold to the PitchYinFFT
In addition, I've noticed that when playing a chord the pitch detection shows some low subharmonic in 100% of cases. Not sure whether this is expected theoretically, but I'm concerned since Yin algorithm is positioned as a "the fundamental frequency (F0) of speech or musical sounds.", which in my mind should somehow deal with an expected difference frequency in a chord.
Yin is designed to work with monophonic signals.
In the attached images the red pitch reading top right (with the confidence in blue below) and the spectrum of the first 1/8 of the frequencies (up to ~3kHz) are taken with the following algo chains. frameSize is 16384, sample rate 48000
What could be the reason for the Pitch to be reported incorrectly?
apologise for the image quality