Detecting voiced vs unvoiced segments using BACF

xavriley commented 2 years ago

I'm wondering if you had any suggestions for this:

example

In the example above the upper graph shows BACF results for this audio file with the "ground truth" annotation on the lower graph.

As you can see, the tracking is pretty good, however I'm having an issue with unvoiced segments. In lots of other pitch tracking literature these are represented by a value of 0 or -1 (see https://craffel.github.io/mir_eval/#module-mir_eval.melody) - for example when an audio region is silent or contains unpitched noise such as plosives or fricatives.

The issue I'm having is that the pitch detector only seems to put out estimates when it is confident that there's a pitch present. It would be nice if we could output a zero when no pitch is present - for example when a whole window has passed with no valid pitch estimates. Is there any way to achieve this with the existing codebase? Otherwise I end up with a situation like the graph above where it linearly interpolates between frequency estimates without ever getting to zero.

One issue that might be relevant is that the files from this dataset are fairly poor in terms of recording quality (think people singing into laptop mics) - the example track above peaks around -10dB and is one of the better ones. I've not had much success with the signal conditioner as a result as it seems to bring up the noise floor regardless of which settings I tried.

djowel commented 2 years ago

I modified the pitch_detector_ex.cpp test in the develop branch. The algorithm is as follows:

For each sample out[N]:

Assume the frequency is -1 (unpitched): out[N] = -1.
When the pitch detector (pd) is ready, set out[N-W]...[outN] to pd.get_frequency(), where W is the PD window size: pd.bits().size()

Here's an example:

Do take note of the space in between 4th and 5th notes, though: in that case, the PD is detecting a low-level note. You might also want to use a noise gate with a specific threshold to disable such low-level notes.

xavriley commented 2 years ago

Thanks Joel - this is looking much better now:

I'm using a threshold on the periodicity too - currently at > 20% just to filter out the really noisy estimates.

This can be closed now.

cycfi / q

Detecting voiced vs unvoiced segments using BACF #42