alesgenova / pitch-detection

A collection of algorithms to determine the pitch of a sound sample.
MIT License
231 stars 27 forks source link

Pitch Detection Algorithm Is Missing a Step from the McLeod Paper #23

Open music-game opened 2 years ago

music-game commented 2 years ago

Let me start by saying this is a great library and works quite well. However, after reviewing the code and working with it in a real application, I believe you are missing this part of the algorithm defined in the McLeod Paper:

From the key maxima we define a threshold which is equal to the value of the highest maximum, nmax, multiplied by a constant k. We then take the first key maximum which is above this threshold and assign its delay, τ , as the pitch period. The constant, k, has to be large enough to avoid choosing peaks caused by strong harmonics, such as those in Figure 2, but low enough to not choose the unwanted beat or sub-harmonic signals. Choosing an incorrect key maximum causes a pitch error, usually a ’wrong octave’.

Your "clarity threshold" somewhat implements this, however you don't do any scaling by the maximum peak. In the two example NSDF plots below, picking the red line (~0.7) as the threshold works, but what if you have a bit more noise in the signal? Perhaps the clarity of all the peaks reduce, by ~0.2 for example. In that scenario, you might incorrectly pick the second peak in the first image, and perhaps pick no peaks in the second image, if you kept your threshold as 0.7.

If instead, you scale the selection threshold by the maximum peak, you would still be able to select the correct peak in either case, because the threshold would move up and down with the clarity of the overall signal. You should still have a minimum clarity threshold below which you don't want to return any result. But when it comes to picking between the various maxima above the clarity threshold, you can get much better octave-selection by applying this third input.

image

I was able to modify your code to accept a third input, which I've called "pick_threshold" that implements the constant from the McLeod paper. With my change, you now will select the first peak that is above both the "clarity threshold" and "maximum_peak * pick_threshold". In my testing, this change led to much fewer octave-errors. I used these settings: power_thresh = 3, clarity_thresh = 0.7, and pick_thresh = 0.98. The pick threshold can be adjusted a bit lower if you are sometimes picking up subharmonics, but I found 0.98 to be a sweet spot to avoid picking up higher order harmonics, which was happening more frequently with the original code.

The only main changes to the code I made were in the pitch_from_peaks function and the choose_peak function. I barely know Rust (I'm compiling this to wasm for a web application), so I'm sure there is a more efficient way to extract the maximum peak than what I did here (duplicating the peaks iterator), but this code did compile and worked a lot better in terms of picking the correct octave.

`pub fn pitch_from_peaks( input: &[T], sample_rate: usize, clarity_threshold: T, pick_threshold: T, correction: PeakCorrection, ) -> Option<Pitch> where T: Float, { let sample_rate = T::from_usize(sample_rate).unwrap(); let peaks = detect_peaks(input); *let peaks2 = detect_peaks(input); let maxpeak = peaks2.max_by(|x, y| x.1.partial_cmp(&y.1).unwrap_or_else(|| Ordering::Equal)); let thresh2 = match maxpeak { None => T::from(0).unwrap(), Some(p) => p.1 pick_threshold, }; choose_peak(peaks, thresh2**, clarity_threshold) .map(|peak| correct_peak(peak, input, correction)) .map(|peak| Pitch { frequency: sample_rate / peak.0, clarity: peak.1 / input[0], }) }

pub fn choose_peak<I: Iterator<Item = (usize, T)>, T: Float>( mut peaks: I, threshold: T, threshold2: T, ) -> Option<(usize, T)> { peaks.find(|p| p.1 > threshold && p.1 > threshold2) }`

alesgenova commented 2 years ago

Thank you for taking the time to look into this and do a few tests!

I don't have any objection to this, just give me a few days to digest the proposed changes to make sure they don't conflict with any other of the detectors implemented in the library.

Also, feel free to open a pull request, if you'd like to retain authorship of this fix.

It's cool to see this little library being used in the real world :)

music-game commented 2 years ago

Hey, no problem.

I'm not familiar enough with the other algorithms to know if they would have a similar parameter, but they might if they also need to pick from multiple peaks corresponding to different octaves. After making this change, it's always possible for a user to just set the pick_threshold to zero if they want to retain the original functionality of picking the first peak above the clarity threshold.

I won't bother with the pull request. As I mentioned, I barely know Rust, so I'd rather let someone more capable with Rust take care of implementing this properly. I'm just happy to be able to use this library in my side projects.

music-game commented 2 years ago

And by the way, in case you are interested, this is my project that is currently running a wasm-compiled version of your library with the changes I mentioned above. It is now much less common to see random jumps to different octaves.

https://imacslive.com/filedata/imacs-live/sing/

magnetophon commented 2 years ago

And by the way, in case you are interested, this is my project that is currently running a wasm-compiled version of your library with the changes I mentioned above. It is now much less common to see random jumps to different octaves.

Cool game! Is the code online? Until someone does a PR that fixes this issue, I'd love to use your version of the fixes for my little project.

music-game commented 2 years ago

I do not have it in a github repository, but you can view it all by inspecting my site:

https://imacslive.com/filedata/imacs-live/sing/

look at these files:

The code to initialize the wasm:

var detector; // make sure you set the scope of this so that you can call it later (globally if necessary)
getwasm();
        async function getwasm() {
        const response = await window.fetch("wasm-audio/wasm_audio_bg.wasm");
        var wasmBytes = await response.arrayBuffer();
        init(WebAssembly.compile(wasmBytes)).then(() => {
            sampleRate = audioContext.sampleRate;
                numAudioSamplesPerAnalysis = 2048;
                       const power_thresh = 3;
                       const clarity_thresh = 0.7;
                       const pick_thresh = 0.95;
                detector = WasmPitchDetector.new(sampleRate, numAudioSamplesPerAnalysis, power_thresh, clarity_thresh, pick_thresh);
        });
    }

Then to use the detector, just pass it an array of samples from your audio stream:

analyserData = new Float32Array(2048); //declare this once and reuse it each time you read more samples
analyser.getFloatTimeDomainData(analyserData); //here I am using an analyzer node to get samples
var pitch = detector.detect_pitch(analyserData); //returns the pitch

Let me know if something is unclear

magnetophon commented 2 years ago

Let me know if something is unclear

Sorry, I'm totally new to this. I was expecting some rust files...

music-game commented 2 years ago

Oh if you are trying to use it in a web application where you will be compiling to wasm anyway, then I suggest just using the wasm I already have compiled from the Rust.

If you aren't using in a web app, and you need the Rust source code, I think your best bet is to just copy the code from this Rust library and make the changes to the code that I mentioned in my original comment at the top of this thread. Depending on how exactly you use it, you might have to adjust some of the objects to accept the new inputs. I barely know Rust, so I just hacked it together for now. Hopefully alesgenova is willing to update his library with the change done properly.

magnetophon commented 2 years ago

make the changes to the code that I mentioned in my original comment at the top of this thread.

Oops, sorry, I totally missed those!

magnetophon commented 2 years ago

@music-game Thanks again for the improvement! I added it to my fork just now.

music-game commented 2 years ago

Nice! Did you notice an improvement with the change?

magnetophon commented 2 years ago

I'm still testing. :)

magnetophon commented 2 years ago

Did you notice an improvement with the change?

Yes! It's massively more likely to pick the right octave. Thanks for catching and implementing this! :partying_face:

music-game commented 2 years ago

Awesome! Glad to hear the change works for you too.

magnetophon commented 1 year ago

Have you seen https://github.com/sevagh/pitch-detection/tree/master/misc/mcleod ? It seems to include this step, judging from a quick glance.

music-game commented 1 year ago

I have not seen that. I did find that repo when I was originally researching pitch detection, but I thought it was just C++ code. I didn't see the misc/mcleod folder. Thanks for pointing it out.