RobinSchmidt / RS-MET

Codebase for RS-MET products (Robin Schmidt's Music Engineering Tools)
Other
56 stars 6 forks source link

Distinguishing relevant peaks from the irrelevant ones #295

Closed RobinSchmidt closed 2 years ago

RobinSchmidt commented 4 years ago

in the debeater, i'm currently facing the problem that the amplitude envelope may contain spurious peaks which a naive peak-finding algorithm (i.e. one which just picks out all values that are higher than its left and right neighbour) will include in its found peaks. these spurious, irrelevant peaks may mess up my meta-envelopes (envelope-of-envelope). this problem of separating the relevant from the irrelevant peaks is actually a quite common problem:

https://www.google.com/search?&q=finding+relevant+peaks&oq=finding+relevant+peaks

...which is actually totally unsurprising, once one thinks about it. the tail-extender uses the criterion, that a peak must be higher than two neighbours to each side (for finding spectral peaks corresponding to partials). i considered generalizing the idea to consider nL neighbours to the left and nR neighbours to the right (in some settings, it may make sense to use asymmetric neighborhoods). that strategy would perhaps help in the debeater - but actually, it seems quite ad-hoc and crude. sifting through the google search results, i came across two ideas, that seem to be a bit more principled and justifiable - this:

https://en.wikipedia.org/wiki/Topographic_prominence

and this:

https://books.google.de/books?id=8h-ZDgAAQBAJ&pg=PT788&lpg=PT788&dq=finding+relevant+peaks&source=bl&ots=ZmIkfb9Zl6&sig=ACfU3U1vLoeM7mPahOTov618ijcnCDTh1w&hl=en&sa=X&ved=2ahUKEwja5Myu2bPmAhViQUEAHZjiAqQQ6AEwB3oECAkQAQ#v=onepage&q=finding%20relevant%20peaks&f=false

a robust peak finding algorithm is actually something, that may be useful in various other contexts as well - for example, in spectral envelope estimation.

...tbc...

RobinSchmidt commented 4 years ago

at first, i found the description of the peak-prominence algorithm confusing because of the minima and maxima - but now i got it (i hope). it works as follows (in a 2D setting, i.e. a landscape above a plane):

for each peak
  for each direction (say north, north-east, east, etc.)
     walk until you find a point with higher elevation than the current peak
     note the minimum elevation along the walked line
  endfor
  take the maximum of these noted minimum elevations
  the difference of peak-height and maximum-minimum-elevation is the peak's prominence
endfor

with 2D, a practical problem may be that there are actually infinitely many possible directions to consider but in our case with 1D data, there are only two directions to consider - left and right, so the algorithm should be straightforward to implement. scipy has it also:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.peak_prominences.html

i actually think, that a relative prominence, i.e. peak-prominence divided by peak-height might be an even better measure of peak relevance. i will try both

RobinSchmidt commented 4 years ago

now i got it (i hope).

nope! i didn't!

with 2D, a practical problem may be that there are actually infinitely many possible directions to consider

it's actually even much worse! i'm referring to the first definition here:

https://en.wikipedia.org/wiki/Topographic_prominence#Definitions

it's not only infinitely many directions of straight lines that would have to be considered (which would still be managable in an approximate way with a loop over a bunch of angles - like 0°, 10°,...350°) but infinitely many arbitrarily shaped paths! :-O i have no idea, how one would approach that algorithmically (edit: now i have one - but don't know, if that would be practical). ...fortunately, these complications don't apply to the 1D case that we are dealing with. there's no path-shape to be considered here. puuh!

RobinSchmidt commented 4 years ago

as for the second idea: the book says that relevant peaks tend to be present on multiple scales. i think, we don't really need to deal with the continuous wavelet transform to arrive at an algorithm based on this idea. my idea is the following: find the peaks, then apply a smoothing filter to the data and figure out which peaks are still peaks after smoothing (i think, we need to tolerate the peak-index to shift one position left or right in each iteration for this check) - then iterate that. the number of smoothing passes, a peak survives, can be used as a measure for that peak's relevance.

here, i have plotted an array of random numbers and successively more smoothed versions of it (with a simple 3-point moving average):

image

it seems that the visually most significant peaks would indeed survive the smoothing best. or...well...not sure...maybe...i guess using random numbers as input is perhaps not the best test of this idea. in keeping with the analogy with landscapes, i would consider the iterative smoothing as a sort of abrasion due to wind and rain - and the measure would be a sort of peak-resilience against harsh weather conditions. ...or something

RobinSchmidt commented 4 years ago

perhaps a much better test input is a random walk - i.e. the cumulative sum of an array of random numbers - here is one, together with the 8-th iteration of the smoothing filter:

image

those peaks that have corresponding peaks in the smoothed version would be picked, the others would be dismissed as spurious by such an algorithm. i think, that looks promising. perhaps, combination of both criteria - prominence and smoothing-resilience would be best

RobinSchmidt commented 4 years ago

i'm playing around with some ideas. here is what happens when i iteratively apply smoothing and take the elementwise maximum with the original data:

image

image

i think i will call this the ropeway algorithm :-) i think, it may be useful as a pre-processing step before finding peaks. it tends to de-peakify a lot of spurious peaks (i.e. turn them into non-peaks)

RobinSchmidt commented 4 years ago

i think i will call this the ropeway algorithm

...although, i'm not sure, if the valleys are catenary-shaped. they actually look more like parabolas to me

RobinSchmidt commented 4 years ago

the idea of alternatingly smoothing and taking the element-wise maximum was inspired by the true-envelope algorithm, which i have implemented a long time ago in matlab as part of my master thesis:

https://hal.archives-ouvertes.fr/hal-01161334/document

it uses cepstral smoothing - but the details of how the smoothing is implemented are probably not really essential to the basic idea of the algorithm

gregjazz commented 4 years ago

The envelope estimator examples shown in the paper look good to me--that could be good to try, especially since you've implemented it in the past.

Your "ropeway" algorithm might be promising, too. Is the way the peaks come to points a concern, though? I wonder if the peaks were treated like end points in a bezier curve, you could ensure that the envelope approaches and departs the point smoothly without affecting the point's value (since smoothing/averaging would tend to round off the peaks).

RobinSchmidt commented 4 years ago

Is the way the peaks come to points a concern,

i'm not sure, if i understand that question. do you mean, if i'm concerned with the shapes of the curves that run through the peaks? if so...well - eventually, when we use the peaks for estimating an envelope - yes - but in the plot above, i'm actually only concerned about finding the (relevant) peaks. what sort of interpolant we later let pass through them, is the next step. in the simplest, case, one could just connect them with straight lines. or one could use a spline. ...for example, the cubic natural splines that i implemented previously, could be used

you could ensure that the envelope approaches and departs the point smoothly without affecting the point's value (since smoothing/averaging would tend to round off the peaks

yes - on the other hand: although using spline curves would ensure that the curve passes (smoothly) through the points, but the spline curve may overshoot a peak (in the neighborhood of the peak). here are some random points connected by lines, and hermite splines of orders 3,5,7 - you can claarly see the overshoot

image

so, which behavior is better - rounding the peak off or potentially overshooting it - may be a matter of judgement

RobinSchmidt commented 4 years ago

or maybe even better is this plot - these are a couple of random numbers (x- and y-coordinates are random), interpolated by a natural cubic spline: image connections are smooth (in fact, even 2nd order smooth - i.e. 1st and 2nd derivatives match at the junctions)...but the interpolants tend to oscillate/overshoot - which may be an undesirable feature for interpolating amplitude envelopes. ....for interpolating audio data - on the other hand - these oscillations are actually desirable - they can be seen as approximation to sinc interpolation

gregjazz commented 4 years ago

i'm not sure, if i understand that question. do you mean, if i'm concerned with the shapes of the curves that run through the peaks?

Something like that, but I missed that this was in a separate thread with a separate (but related) goal of finding relevant peaks. :P

RobinSchmidt commented 4 years ago

i have implemented a more efficient variation of my "ropeway" idea - the original algo was really cpu expensive due to the iterative smoothing and maximum-taking (this required many passes through the data). the following is more efficient (it needs just 2 passes through the data): basically, run an envelope follower with zero attack and some adjustable release - once forward, one backward through the data (in (pseuo)parallel) and then take the element-wise maximum of these two passes. this can be seen here:

black: original envelope/signal/"landscape", blue: right-to-left pass, green: left-to-right pass, red: imagine the blue and green functions to be combined by taking their element-wise maximum and connect the peaks of the result:

image

the idea is that larger peaks cast "shadows" and the smaller peaks will be ignored, if they fall under the shadow of a larger peak. i call this "peak-shadowing". the widths of shadows can be asymmetrical, too - it may make sense to have larger rightward shadows (here in green) in the case of audio envelopes

RobinSchmidt commented 4 years ago

here are envelopes that result with various settings (green, red, purple: half-widths of the shadows set to 10, 20, 30), blue: "fine" envelope (all peaks of the original data are taken to be relevant), cyan: "coarse" envelope (the only condition is that nothing should "stick out")

image

RobinSchmidt commented 4 years ago

here, i have an artificially created an envelope that shows some features that are typical for the rhodes samples - and i have added these little spikes in the troughs - these are the irrelevant peaks that i want to get ignored by the peak-picker (black is the envelope, blue the "shadowed" envelope). as can be seen, in the shadowed envelope, the spikes fall under the (blue) shadows and hence will be ignored (the shadows are asymmetrical here, leftward shadows are not visible): image ...well - not all of them - those more to the right won't because the actually are higher than the surrounding. and this is what the algo then would finally take as the peak-envelope

image

as can be seen - there's this other problem that the decay-section will just be interpolated by a straight line. to solve this, i need some sort of artificial "densification" of the envelope in areas where there aren't any peaks around. this is all well and good - but what what i'm currently grappling with, is how to set the parameters of the shadowing algorithm automatically. ...as well as some issues, how exactly the "artificial densification" should work

elanhickler commented 4 years ago

HMMM, this might not be needed but wondering if this would be fast to implement: what about getting the mid point between positive peaks and negative peaks? Essentially, do what yo are doing with the blue line except have a blue line on the opposite side of the peak, then get a medium line.

elanhickler commented 4 years ago

wait... that just sounds like curve fitting. Why didn't curve fitting work for this?

RobinSchmidt commented 4 years ago

hmmm...in the case of beating, some intermediate line between the peaks and the troughs actually makes more sense than a line through the peaks....curve fitting....hmmm - you would first have to decide what kind of curve it should be - for example, some polynomial, that fits the whole curve globally in a least squares sense? ...or maybe using segments and fit curves locally - as in splines?

elanhickler commented 4 years ago

this peak algorithm might benefit from smoothing with splines or something, might be better than straight curve fitting after being smoothed.

Since I've been away from this for a while could you give me a refresher on how I test this? Here's my code:

int fs = audioFile.getSampleRate();
int bd = audioFile.getBitDepth();
auto ptr = audioFile.getWritePointer(0);
int N = audioFile.getNumFrames();

vector<double> x; // input signal
x.reserve(N);
for (int s = 0; s < N; ++s)
    x.push_back(ptr[s]);

RAPT::rsSinusoidalModel<double> sinusoidModel = getSinusoidModel(&x[0], N, fs, f0);
std::vector<double> y = synthesizeSinusoidal(sinusoidModel, fs);

// WRITE DEBEAT OUTPUT UNMODIFIED HERE
if (refPath.isNotEmpty())
{
    AudioBuffer<float> audioBuffer(1, (int)y.size());
    auto writePtr = audioBuffer.getArrayOfWritePointers();
    for (int f = 0; f < int(y.size()); ++f)
        writePtr[0][f] = float(y[f]);
    Render::file(audioBuffer, File(refPath), fs, bd);
}

// mdl now contains the sinusoidal model for the sound. Now, we set up a rsPartialBeatingRemover 
// and apply the de-beating to model data, then synthesize the de-beated signal and write into
// wavefile:
RAPT::rsPartialBeatingRemover<double> deBeater;
deBeater.setPhaseSmoothingParameters(5.0, 1, 4); // cutoff = 10 leaves a vibrato
deBeater.processModel(sinusoidModel);
y = synthesizeSinusoidal(sinusoidModel, fs);

// WRITE DEBEAT FINAL OUTPUT HERE
{
    SimpleAudio audioBuffer(1, (int)y.size(), fs, bd);

    for (int f = 0; f < int(y.size()); ++f)
        audioBuffer.getWritePointer(0)[f] = float(y[f]);

    Render::file(audioBuffer, newPath);
}

return var::undefined;
RobinSchmidt commented 4 years ago

i've been away from this also, but i just revisited it again and i think, i'm getting better results now. there are a couple of new settings to play with:

deBeater.envExtractor.peakPicker.setShadowWidths(0.0, 0.5);
//deBeater.envExtractor.peakPicker.setWorkOnPeaksOnly(true); // default is false
deBeater.envExtractor.peakPicker.setNumNeighbors(3);

deBeater.envExtractor.setMaxSampleSpacing(0.5);

as for splines or smoothing - there is already the option to use splines, for example like so:

  deBeater.envExtractor.setInterpolationMode(
    rsInterpolatingFunction<double, double>::CUBIC_HERMITE); 

but i have found this to produce bad overshooting artifacts. but maybe we could try running a smoothing filter over a linear interpolant

RobinSchmidt commented 4 years ago

maybe we could try running a smoothing filter over a linear interpolant

i tried it - i get things like this:

image

image

it's smooth and doesn't suffer from overshoot - but doesn't pass through the datapoints. not sure, if that's desirable. note that the datapoints i took from the tail are purposefully coarsely sampled, so we can still see what the "densification" algo does. in practice, one would choose a higher datapoint density (ideally, something slightly above the beating period) and have a better represented tail section

elanhickler commented 4 years ago

green looks desirable.

elanhickler commented 4 years ago

Also, when two harmonics are beating, then the peak of that beating is going to be larger in amplitude than either harmonic, so passing through the data points is not necessarily desirable. But maybe not bad either. We just need to hear the end result to decide.

elanhickler commented 4 years ago

How do you stop the GNU plots showing up? I can't run the function because it gets halted by seemingly infinite GNU plots

RobinSchmidt commented 4 years ago

How do you stop the GNU plots showing up? I can't run the function because it gets halted by seemingly infinite GNU plots

oh - yes - this is because i have injected plotting commands into the code to see whats going on during development - because this is still under development and actually not yet ready for general use. just comment these out. to find them, you can set some debug breakpoints in the functions in rapt/Basics/Plotting.h/cpp and track the calls down via the debugger's function callstack. i guess, i should provide a global, library-wide switch (with a #define or something) to turn plotting on and off

elanhickler commented 4 years ago

yea, ok i commented it out. Here's before and after debeating. Here's an example where a harmonic detection algorithm doesn't take into account where the harmonic should end. Also, there's still a gap or two... strange... Still testing more samples to see if we're getting somewhere. The debeat algorithm is slow as s*** at least in debug mode.

ezgif-1-569aa958099c 1

elanhickler commented 4 years ago

Ok, the results are getting better. The only issue is the lack of an end point for harmonics.

RobinSchmidt commented 4 years ago

...aaand the amplitude gaps. i see them in your example, too (2nd harmonic towards the end). end-point may (hopefully) be a simple post-processing step - i imagine to just run through all the partials after analysis and search for a time-instant after which the partial remains below a certain (relative) threshold or something. but these gaps are a really pesky issue. i'm still struggling with the implementation of the dolph-chebychev window. it's a bitch. all the formulas i find online or in books just don't seem to work. fortunately, i just found working code for it in the scipy codebase - have to reverse engineer it....

elanhickler commented 4 years ago

So, there's a CRAP ton of squiggles. Could we try maybe a function of "if squiggly [if frequency is inconsistent between frames] for more than X frames, end the harmonic". Also, it would be good to smooth out the frequency so it's less squiggly.