ar1st0crat / NWaves

.NET DSP library with a lot of audio processing functions
MIT License
453 stars 71 forks source link

Wrong results from Pitch.FromAutoCorrelation and Pitch.FromCepstrum #37

Closed ToGoOrNotToGo closed 3 years ago

ToGoOrNotToGo commented 3 years ago

Pitch.FromAutoCorrelation returns half the fundamental frequency. The result from Pitch.FromCepstrum is totally different or I dont understand the result. I have tested with a sinusoidal signal.

ar1st0crat commented 3 years ago

The results are correct, however: 1) pitch estimation methods are not ideal themselves; 2) method parameters can be tweaked in particular cases. In most audio tasks the range [80Hz .. 400Hz] is OK. Let's start with this example:


// sinusoid 237 Hz
var signal = new SineBuilder().SetParameter("freq", 237).SampledAt(22050).OfDuration(1).Build();

var pitch1 = Pitch.FromAutoCorrelation(signal);
var pitch2 = Pitch.FromCepstrum(signal, 0, 2048, fftSize: 2048);

// pitch1: 237.0968
// pitch2: 239.6739

For larger frequencies you should increase the default high parameter. Also, depending on the block size, the results may vary in Autocorrelation mode (you can read about it in books/Internet). Anyway, usually we process audiodata blockwise in relatively small blocks, so it's not difficult to find 'good' parameters:


// sinusoid 750 Hz
var signal = new SineBuilder().SetParameter("freq", 750).SampledAt(22050).OfDuration(1).Build();

// process block of size 2048:

var pitch1 = Pitch.FromAutoCorrelation(signal, 0, 2048, high: 1000);
var pitch2 = Pitch.FromCepstrum(signal, 0, 2048, high: 1000, fftSize: 2048);

// pitch1: 760.3448
// pitch2: 760.3448

PS. FromCepstrum method expects the length of a signal to be not greater than the fftSize (which is 1024 by default), so fftSize and blockSize should be equal (I'll later commit the code that will auto-adjust these parameters).

ToGoOrNotToGo commented 3 years ago

Once again: Thank you very much for the explanation!

Maybe it is better if the default setting for low and high is full range (0 to nyquist)? And the user can minimize the range for performance reasons... or am I wrong again ;-)

ar1st0crat commented 3 years ago

No, the range [0 .. Nyquist] doesn't make much sense. In most 'pitch-estimation' cases the frequency is not very high. I don't remember where exactly I took the range [80Hz .. 400Hz], but I'd still prefer not to change it. For speech processing tasks it's just OK. Although, if there're at least couple other feature requests for changing this range, I'll do it )