cycfi / q

C++ Library for Audio Digital Signal Processing
MIT License
1.17k stars 151 forks source link

Explanation of how to best use the pitch detector #21

Closed gunhaxxor closed 1 year ago

gunhaxxor commented 4 years ago

Hello! I've tried out the q library for hardware pitchtracking on a Bela. It works very well! As of now I'm simply calling the operator and subsequently "get_frequency" every time the operator returns true.

I'm curious to the structure of the pitch_detector.hpp and the public member functions. Some things are not clear to me. What are the purpose of the different functions? How and when should they be called? If they return something, what is the expected range (for example periodicity)?

A more general description would be much aprreciated. Here are some more specific questions I have:

Grateful for any response <3

djowel commented 4 years ago

I'll reply in more detail as soon as I can. But before that, you might want to check out the dual-pd-v2 branch. What's better than BACF? Well, dual BACFs :-)

The API has changed with the (single) pitch_detector, and so is the behavior with "predict_frequency".

More soon...

djowel commented 4 years ago

BTW, have you read these:

https://www.cycfi.com/2020/07/fast-and-efficient-pitch-detection-revisited/ https://www.cycfi.com/2018/06/fast-and-efficient-pitch-detection-synth-tracking/ https://www.cycfi.com/2018/04/fast-and-efficient-pitch-detection-bliss/ https://www.cycfi.com/2018/03/fast-and-efficient-pitch-detection-bitstream-autocorrelation/

If not, please read up. TL;DR is not an excuse ;-)

gunhaxxor commented 4 years ago

Yes. I have. Veeery interesting indeed! So yes. I have a slight hunch from the articles how the functions in the pd might be used. But I would still feel more comfortable to get a explicit explanation. I will also try to plot the output from the functions in my scope, to see how they react to the incoming audio. Nevertheless. A brief explanation of the API would probably be beneficial to most user of the library 👍

djowel commented 4 years ago

Proper docs are needed. I'll write one when things settle down a bit. Right now, it is still in flux.

  • What does the hysteresis parameter in the constructor do?

It's the zero crossing hysteresis. You'd want some hysteresis for noise suppression at the lowest levels. Depends on how noisy your signal is.

  • What's the difference between "predict_frequency" and "get_frequency"? What is the init parameter for "predict_frequency"?

Predict frequency does not use BACF. Since BACF requires two full cycles, you want an early prediction at the onset. It turns out that you can do an early prediction based on the zero crossing on some instruments, such as the guitar. Sometimes it gives a false prediction, but it does not matter if it's only for a few milliseconds (it will not be audible, as noted in the articles).

  • What does is_note_shift do? I guess it returns true if there is a switch in frequency identified? But how about onset from silence? Does it return true also?

Yes, it means there's a valid note shift. Onset from silence too. But take note that this is conservative. For example, it's a bit vague for glissandi and pitch bends.

gunhaxxor commented 4 years ago

Thanks! Much clearer now! So. My understanding is that one could use the predict_frequency until the BACF says it's ready? Something like this:

static float trackedFrequency = 0.f;
bool pdIsReady = pd(inSample);
trackedFrequency = pd.predict_frequency();
if (pdIsReady)
{
  trackedFrequency = pd.get_frequency();
}

//USE THE TRACKED FREQUENCY HERE

Does that make sense?

I still wonder though, what is the purpose of the init parameter for the predict-method?

djowel commented 4 years ago

Yes. Ignore init, it will go away. Actually please check the dual-pd-v2 branch. It's the latest and the greatest :-)

resynth commented 4 years ago

I'm also using the pitch detector with Bela and it's working well on violin, thanks for the good work! Some observations:

When used with a single, monophonic pickup the low and high frequency bounds have to be set to the whole instrument. This means, as the BACF buffer is set long enough for the low frequency, that the buffer is way bigger than needed for the high frequencies. The high notes then cause cpu spikes leading to buffer underuns. I have to set Belas buffer a lot bigger to process the high notes than the low ones, it's a frequency dependent cpu overhead!
If I set the low/high freq bounds to the range of a single string the problem goes away so this shouldn't be an issue for polyphonic output instruments (and I'm hoping to get some Nu2 pickups in the long run). I wonder if there is a possible optimisation for monophonic pickups, such as high notes not being required to process as much of the BACF buffer?

With violin I'm getting quite a few octave up mistakes where the pitch detector decides the 2nd harmonic is the fundamental. I think my violin note onsets possibly don't have much fundamental till the string really gets going (I'll test this theory more). You mention, in one of your articles, that once the PD has detected a note it won't allow itself to change to a harmonic (so a decaying guitar note doesn't retrigger PD to the 2nd harm), makes sense. You also mention that you have done the same for sub-harmonics? I'm not aware of subharmonics existing on monophonic notes from instruments? 2 notes together will make a difference frequency subharmonic but PD is for monophonic input. If a lower frequency is becoming apparent in the note surely, 99% of the time, it's the fundamental. Your PD is so quick it seems to decide on the note before the bowed string fundamental has established itself and then refuses to reconsider it's choice in the coming milliseconds? If it chooses the 2nd harm and then, say 10ms later, has found a lower harmonic (the fundamental) I'd think it would be better to switch. This requires a closer look to confirm what I think is happening but maybe you have some insight? For now a LP filter is helping.

I'm not having much luck with predict_frequency. Using it as the above example causes a lot of glitching. Not a huge problem as it's tracking pretty fast without predict_frequency but, again, wondered if you had some insight?

Not huge problems, they all have workarounds. No warbling or out of tune notes.

What is the difference with the dual_pd v2? 2 BACFS run on a single sound source?

djowel commented 4 years ago

Very nice observations, @resynth! I'll have a thorough reply later.

djowel commented 4 years ago

Thinking about this, there's probably a lot to discuss here. As you probably know, the Q pitch detector is highly tuned for the guitar, with lots of tests centered around it. It is also highly tuned for per-string processing. I decided to create a discord forum, so we can talk about this.

Let's move the discussion over there: https://discord.gg/4MymV4EaY5. Later, I can copy the relevant parts of the discussion here.

djowel commented 4 years ago

It seems we're not syncing on discord. Perhaps it's because of the weekend, I'm not sure. I guess I'll just have to move it here again...

When used with a single, monophonic pickup the low and high frequency bounds have to be set to the whole instrument. This means, as the BACF buffer is set long enough for the low frequency, that the buffer is way bigger than needed for the high frequencies. The high notes then cause cpu spikes leading to buffer underuns. I have to set Belas buffer a lot bigger to process the high notes than the low ones, it's a frequency dependent cpu overhead! If I set the low/high freq bounds to the range of a single string the problem goes away so this shouldn't be an issue for polyphonic output instruments (and I'm hoping to get some Nu2 pickups in the long run). I wonder if there is a possible optimisation for monophonic pickups, such as high notes not being required to process as much of the BACF buffer?

This is a known limitation. As you know, my original focus is on multichannel guitar, with a two octave range per string. I have a full test suite for that. I have not tested on anything else, besides the multichannel guitar. I did do some rough tests on synthesized (violin-like) note ranges, but that's it. It would help immensely if you can provide me with violin samples similar to the ones I have for guitar, both single notes and phrases.

Yes, it is possible to optimize the implementation.

With violin I'm getting quite a few octave up mistakes where the pitch detector decides the 2nd harmonic is the fundamental. I think my violin note onsets possibly don't have much fundamental till the string really gets going (I'll test this theory more). You mention, in one of your articles, that once the PD has detected a note it won't allow itself to change to a harmonic (so a decaying guitar note doesn't retrigger PD to the 2nd harm), makes sense. You also mention that you have done the same for sub-harmonics? I'm not aware of subharmonics existing on monophonic notes from instruments? 2 notes together will make a difference frequency subharmonic but PD is for monophonic input. If a lower frequency is becoming apparent in the note surely, 99% of the time, it's the fundamental. Your PD is so quick it seems to decide on the note before the bowed string fundamental has established itself and then refuses to reconsider it's choice in the coming milliseconds? If it chooses the 2nd harm and then, say 10ms later, has found a lower harmonic (the fundamental) I'd think it would be better to switch. This requires a closer look to confirm what I think is happening but maybe you have some insight? For now a LP filter is helping.

Again, tests can reveal that and guide optimization. An LP filter can definitely help. I think there's a certain awkward part of the implementation where there's a potential n^2 computation, where n is the number of edges. A very busy zero-crossing can wreak havoc. That is one area I need to optimize, but in the meantime, an LP will do great. Are you using the pd_preprocessor, BTW?

I'm not having much luck with predict_frequency. Using it as the above example causes a lot of glitching. Not a huge problem as it's tracking pretty fast without predict_frequency but, again, wondered if you had some insight?

You do not need it. The violin has a high (lowest) fundamental frequency and latency is not an issue. We just need to optimize the implementation for it.

What is the difference with the dual_pd v2? 2 BACFS run on a single sound source?

That is correct. dual_pd uses two PDs run from a single source. The code will choose the one has better periodicity than the other, and if there's ambiguity, resolve using heuristics (e.g. choose the one that closer to the mean). In my tests, the single PD has very good accuracy, with a few false detections after thousands of predictions. The graph below is an example (explanations later):

Screen Shot 2020-08-09 at 8 27 12 AM

Notice the two spikes at the onsets? The onsets are indeed tricky.

I used to use a median filter to eliminate such spikes, but that gives you some delays and as you noticed, can prevent real note-shifts from happening due to its memory of the past. Once I removed the median filter, I was able to identify some bugs that the median filter masked. But then I get a few spikes like that one.

With the dual PD, I no longer use a median filter, and I get perfect scores with my tests. It's very interesting because I just pass the positive crossings to one PD and the negative crossings to another and I get significantly lower probability of false detections.

The downside obviously is twice the processing required. So I'll have it as optional.

resynth commented 4 years ago

I think we live in fairly different time zones, discord sync may not happen! I'll record some violin and get it to you for testing.

I haven't been using pd_preprocessor as it doesn't quite suit my requirements. I have been pre-processing the violin with a HP filter around 150hz, an all-pass filter for some phase rotation and the aforemetioned LP filter. Bowed string instruments produce asymmetric waveforms whereby one half of the zero crossing contains significantly more energy than the other. When you switch bow direction it changes. This is due to the way the bow acts on the string and has the effect of producing a waveform with all the even harmonics in phase. This is, to a lesser extent, a problem with the human voice and wind instruments too. I suspect it's sub-optimal for pitch detection (it's sub-optimal for most things!) but the all-pass filter solves the problem by rotating the phase of the harmonics.

I'll record violin both from the raw piezo pickup and pre-treated for symmetry. Would be interesting to see if their is a difference for pd.

djowel commented 3 years ago

@resynth did you see the recent replies on discord?. Please email me if you can: djowel-at-gmail-dot-com