RobinSchmidt / RS-MET

Codebase for RS-MET products (Robin Schmidt's Music Engineering Tools)
Other
56 stars 6 forks source link

Need amplitude matching algorithm ASAP #293

Open elanhickler opened 4 years ago

elanhickler commented 4 years ago

Given waveform A and waveform B, what is the negative time offset for B in which A and B share common amplitude?

waveform A and B image

example time offset for match image

Add it to my tab, can you get this done within 7 days? Thank you! Edit: Actually I will pay for this work right away.

RobinSchmidt commented 4 years ago

so, in your second pic, you have shifted the green signal such that its decay section of the envelope best matches the decay of the other signal?

elanhickler commented 4 years ago

Yes

RobinSchmidt commented 4 years ago

can i assume both signals to have the same overall amplitude and decay rate, like in your pic?

elanhickler commented 4 years ago

uhhhhh.... well both signals will be similar, like a loud hit of an instrument vs a quiet hit. I think signals will always be the decaying type.

RobinSchmidt commented 4 years ago

hmm...ok - i have an idea: use the log of the envelope (should be roughly linear, assuming exponential decay) -> fit a line to both log-envelopes by linear regression -> from the difference in the y-intercept of both lines, compute an x-shift to make the y-intercepts coincide - we'll see, if this works out....

RobinSchmidt commented 4 years ago

hmm - ok - this is what my algo does to two exponential decays with different initial amplitudes and matching decay times:

image

blue: reference signal, green: to-be-shifted signal, red: shifted version of green signal. ...so this works - it has to because it was constructed to work like that. ....next step: figure out, what it does when it encounters situations that do not match the idealized scenario for which it was constructed...

RobinSchmidt commented 4 years ago

this is what happens when the green decays twice as fast as the blue - it gets right-shifted more:: image dunno what would be the most reasonable behavior in cases where the decay-times don't match. in such a case, the notion of fitting the shape becomes ambiguous because the two signal do simply not have the same shape

RobinSchmidt commented 4 years ago

when green decays half as fast as blue, it even gets left-shifted: image

RobinSchmidt commented 4 years ago

.....but i guess, if it's the same instrument hit at different strengths, the decay time actually will match because its a property of the instrument and not of the way in which it was excited

RobinSchmidt commented 4 years ago

but - by the way - what is actually the purpose of such a match? it doesn't really strike me as useful to shift the quiet hit forward in time to match the decay of the loud hit. i mean, if you prepare a sample-set for a sample based instrument, you want the sound to start when you hit the key - not some time later,...

RobinSchmidt commented 4 years ago

i just had another idea: as it stands, the algorithm will be based purely on the envelope - maybe that's good enough, but maybe you also want the phase of the shifted signal to be aligned with the reference signal? in this case, we may use this algorithm above to find a rough ballpark value for the shift and then refine it by cross-correlation of the actual signals (not just look at their envelopes) - this refinement should change the shift-value by at most half of a cycle-length

RobinSchmidt commented 4 years ago

ah - it's now clear to me, why the non-matching decays are shifted the way they are: the algorithm contains a user adjustable parameter, the "match-level", which gives the level (in decibels) at which the two envelopes are forced to match. i have set this to -20 dB in the experiment which is why the blue and red curves meet at 0.1. so they meet at the 0.1 level, because i said so. and if the decays in fact do match, the match at one point implies that the whole curve must match

RobinSchmidt commented 4 years ago

if the envelope is plotted in the db domain, the exponential decays become lines. if the decay times are the same, the lines have the same slope and the algorithm brings them into coincidence along the whole line. if the two lines have different slopes, it brings them into coincidence at one point which is determined by the match-level. that's the only thing you can do with a shift with two lines that aren't parallel, so that's probably a reasonable behavior in the non-ideal case

RobinSchmidt commented 4 years ago

...that's the only thing you can do with a shift with two that aren't parallel,...

assuming, shifting is the only thing, we can do. of course, we could also change the slope, if we wanted to. that would amount to modifying the decay time of the 2nd signal to match the decay time of the first. ....of course, this goes beyond what you have ordered but if you consider that a useful thing to do, i could figure out the details

RobinSchmidt commented 4 years ago

here is a more realistic scenario where the two decays are not too wildly different:

image

elanhickler commented 4 years ago

That looks good. I've not encountered a situation that would require modifying the decay for a better match. You can add phase alignment but I already have that capability and intend to use it. If you added it into this algorithm that would be one less step for me, either way it's fine.

On Tue, Oct 1, 2019, 9:54 AM Robin Schmidt notifications@github.com wrote:

here is a more realistic scenario where the two decays are not too wildly different:

[image: image] https://user-images.githubusercontent.com/1833099/65982912-a8c3e980-e47c-11e9-9cec-f46220ebb544.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RobinSchmidt/RS-MET/issues/293?email_source=notifications&email_token=ACFRXH2X34HIDDSWU4QUP73QMN6F3A5CNFSM4I3VZL42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAB6U2I#issuecomment-537127529, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFRXHZQZROJOGQK6KYUZULQMN6F3ANCNFSM4I3VZL4Q .

elanhickler commented 4 years ago

The purpose of this is to record one full length hit sample, glue the tail of that on to short recordings of hits. You can get many round robins / repetitions without having to record full length samples for every hit.

RobinSchmidt commented 4 years ago

so, this is meant as a time saver for recording sessions? and/or as a space saver for sample based instruments - as in splicing together a single full decay sample with various different attack samples?

elanhickler commented 4 years ago

Yes

RobinSchmidt commented 4 years ago

nice! i like the idea of sample-based instruments that don't eat up gigabytes of data. i'm actually also quite fond of the idea of combining attack transient samples with a synthesized "body" - as roland did in their LA synthesizer series ("linear arithmetic" - for whatever reason). ...and i actually think, that my modal synthesis filters are perfectly suited for synthesizing such a body - at least for decaying sounds

RobinSchmidt commented 4 years ago

ok - there's a new class in rapt, called rsExponentialEnvelopeMatcher. it's not yet finished, though. i'm planning to incorporate a few more user parameters that let you set it up such that it may ignore the initial and final portion of the sample for computing the regression coefficients - assuming they contain the attack and noisefloor respectively and would potentially distort our regression measurements. ah - and yes, i'm planning to implement the phase match as well

RobinSchmidt commented 4 years ago

i'm planning to incorporate a few more user parameters that let you set it up such that it may ignore the initial and final portion

done. i've also included the option to ignore samples below a given level threshold.

edit: oh - and to use it, you'll have to have an instantiation of the template somewhere or you'll get linker errors - but you already know that game. i don't instantiate it myself in rosic at the moment

elanhickler commented 4 years ago

image

Hmm, this is generally not working for real world signals as well. I think we need more of an overall measure of the amplitude over time and do something like a amplitude curve match similar to how to waveform matching / cross-correlation thing works.

this is what i get: image

this is what I expect: image

You put the "shorter" signal in the first argument, right?

elanhickler commented 4 years ago

(0096)DPan_MutedMalletsEastn=D2tail=3.zip

(has the two audio files 44100 16b mono)

RobinSchmidt commented 4 years ago

You put the "shorter" signal in the first argument, right?

you mean, here?

  T getMatchOffset(
    const T* referenceEnv, int referenceNumSamples, 
    const T* shifteeEnv,   int shifteeNumSamples);

well, no - the longer signal is the reference signal (which stays fixed), and the shorter one is the "shiftee". i'll have a look tomorrow within my test codebase. by the way - which envelope extractor do you use? one thing, i see is that the shape of the envelope does not actually look very exponential. it looks more linear. maybe that could be the reason and we need an algorithm that makes less assumptions about the shape.

cross-correlating envelopes...yaaa - that was the first thing that came to my mind, too. i think, i didn't follow that because i would expect the cross-correlation to be more or less like an plateau for the entire length of the envelope (the "shape" matches everywhere just the same)...just the overall level is different. a normalized cross correlation should be invariant with regard to overall signal level ...or maybe i have that wrong? ...i need to try.... but anyway, we do actually have the overall level and can use that info for matching as well - which is in fact - the most obvious thing to do. ...i was also thinking about using a similarity measure based on the time-averaged difference of the envelopes (maybe also suitably normalized with respect to overall level).

RobinSchmidt commented 4 years ago

oh - wait - you mean, because in my code comment i say: // compute, by how much we must shift x1 to match x2 at the given match level: oh - i guess, that comment may be wrong then - will verify....

RobinSchmidt commented 4 years ago

hmmm - here is what i am getting:

image

with the following settings:

  RAPT::rsExponentialEnvelopeMatcher<double> em;
  em.setMatchLevel(-50); 
  em.setInitialIgnoreSection1(60000); 
  em.setInitialIgnoreSection2( 6000);
  em.setIgnoreThreshold1(-70);
  em.setIgnoreThreshold2(-70);

i used large "initial ignore sections", especially for the 1st (reference) signal in order to skip the initial silence section and the two initial decay sections. it seems to have a 3-stage decay...or some sort of decay1-decay2-sustain-decay3 shape - and moreover, the decay-sections look more linear rather than exponential. ....my result actually looks like the shape matches but the blue signal should just be shifted downward, i.e. be some 5 or 6 dB quieter. are both signals actually of the overall same level? (i've plotted the envelope on the dB scale)

RobinSchmidt commented 4 years ago

...wait! i just found a stupid bug....

RobinSchmidt commented 4 years ago

i just found a stupid bug

the user specified the ignore-levels in dB and i was checking against linear amplitudes without converting (rendering the ignore-threshold settings ineffective) - this is fixed now. it's getting better - here, i have cut out a section from the tail of the original sample and tried to re-match it with the full signal:

image

settings: ignore1: 60000, ignore2: 1000, threshold: -65, match-level: -55. the jagginess is because i'm using a simple envelope follower with zero attack (and the plots use naively decimated data - gnuplot doesn't like big datasets)...hmmm...it's still a bit too far to the left...

RobinSchmidt commented 4 years ago

this is what happens now (with the same settings) with the other tail section that doesn't come from the original full signal: image

elanhickler commented 4 years ago

The last two results look good.

I'll be dealing with some pretty quiet to loud signals. It won't be practical for me to set match level (maybe I'll set ignore to -90). Match level should be dictated by the "short" signal which needs a tail. The short signal will be a piece of audio and it won't require ignoring sections.

I'll see how far I get with this.

RobinSchmidt commented 4 years ago

Match level should be dictated by the "short" signal which needs a tail

wait - so far, the short signal was the tail and the long signal was the full length signal with attack and tail. are you now talking about another scenario where there's only a short attack sample to which a (longer) tail should be matched?

elanhickler commented 4 years ago

GOOD image

and I'm getting 90% bad results.

before processing: image

after processing: image

wtffffffffffffff?

RobinSchmidt commented 4 years ago

I'm getting 90% bad results.

what about the other 10%? can you identify signal features that they have in common which the bad 90% don't have?

elanhickler commented 4 years ago

The first issue is that I'm getting insane time offsets, larger than the given files.

a 1 second compared to an 8.5 second file gave an offset of -31 seconds. I confirmed that my samplerate was 96000 as well and all is working in that regard. For a sanity check I get +31 seconds when swapping the inputs (reference vs shift)

elanhickler commented 4 years ago

I'm supposed to be using the same rsExponentialEnvelopeMatcher?

made sure i got the latest update image

RobinSchmidt commented 4 years ago

yes - that code looks right. that was my stupid bug

hmm...well...i could sanity-check and clip the results - but that wouldn't solve the underlying problem (but maybe it's a good idea anyway? dunno). should i try these files myself?

elanhickler commented 4 years ago

All samples (tail=1) need to be matched to the single wav file (tail=3)

PanSamplesForTailMatch.zip

RobinSchmidt commented 4 years ago

which are the ones where you get this 31 sec shift?

RobinSchmidt commented 4 years ago

with this pair:

  testEnvelopeMatching(
    "MutedMallets/(0042)DPan_MutedMalletsNorth`n=D2`tail=3",
    "MutedMallets/(0034)DPan_MutedMalletsNorth`n=D2`tail=1");

i get this result (using the exact same settings as i said above)

image

...which looks quite good. did i pick the wrong pair? btw: the sampe-rate of the files is 44100

RobinSchmidt commented 4 years ago

ok - this here:

  testEnvelopeMatching(
    "MutedMallets/(0042)DPan_MutedMalletsNorth`n=D2`tail=3",
    "MutedMallets/(0031)DPan_MutedMalletsNorth`n=D2`tail=1");

is weird: image (still "in range" though). but here, we are not really dealing a decaying tail type of envelope. that could easily confuse the algo

RobinSchmidt commented 4 years ago

but, of course, if your tail envelope is not really of a decaying shape at all - such as this one - you can get anything. imagine an almost flat envelope that's flat somewhere around -50dB....and you try to find the point where it crosses -20 dB or something - that could be faaar off and out of range.

RobinSchmidt commented 4 years ago

what is your match-level setting?

RobinSchmidt commented 4 years ago

ok - i think, 0031 is really a good example for a bad sample and 0034 a good example for a good sample. with 31, you are trying to match an envelope that is actually quite flat on the average (due to the 2nd bump) to the decaying envelope of the reference signal. the algo expects a situation like the one with 34, where you have a clearly decaying nature in the tail env

elanhickler commented 4 years ago

Having to set match level is impractical.

It looks like we cannot at all expect a decaying envelope. Instead we have to match two random shapes. The match is wherever they most cancel out I think, like cross-correlation.

The 31 second result is from 31/42, but at the new sample rate 44100 the results are different, equally as bad. None of those samples give satisfactory results.

RobinSchmidt commented 4 years ago

Having to set match level is impractical.

well, we could perhaps let the algo automatically pick it based on the average level of the tail signal. ...but if

we have to match two random shapes.

that's probably not worth to pursue.

I think, like cross-correlation.

yes - but bog standard cross-correlation would probably not be suitable, due to its dependence on overall level. for the two exponential decays that i used as my first example, i would expect the cross-correlation function to have a maximum at zero shift. it's just the sum of the products of signal values for a given amount of shift - and that sum of products would be largest for a zero shift with this example - just because the reference signal has higher overall level at the beginning. ...sooo, it would need some suitable normalization with respect to overall level (edit: but even with normalization, the cross-correlation would probably have a plateau. ).....and the normalization may rule out a fast, FFT-based cross-correlation algo. ....but maybe not - i have to check the math.

but anyway - i think, it's overkill to use envelopes at the full sample-rate anyway. so, if we have to resort to a slow O(n*m) algorithm (n, m being the lengths of the envelopes), we can probably make it practical by decimating the envelopes. anything more than 1 datapoint per cycle does not give us any useful additional information about the envelope anyway. it didn't really matter much with my current algo because it's O(n+m) and the decimation itself would have the same complexity, so i just used full sample rate envelopes. at the moment, i tend to think, the sum of the absolute value (or maybe the square) of the difference of both envelopes (as function of the shift) might be a suitable similarity measure ....i'll have to do some experiments....

RobinSchmidt commented 4 years ago

here are similarity measures based on sum of absolute and squared differences. the minimum is clearly identifiable (especially for the absolute value) and exactly at the position of the best match: image (the input was my pair of decaying exponentials again - but the algo doesn't rely on that). sooo - that looks promising. i must leave now - i will add the code to the library tomorrow

elanhickler commented 4 years ago

Ah, so once you add this to the library, I can try again to see if I get some good results?

RobinSchmidt commented 4 years ago

once you add this to the library, I can try again to see if I get some good results?

yes - there's a new function getBestMatchOffset in MiscUnfinished.h - but it's still preliminary. it should work but it still lacks the decimation feature, so it's probably very slow, unless you give it decimated envelopes yourself. it expects the two envelopes and their lengths as inputs and returns the best shift (it's the position of the minimum/cusp of the black curve above) in samples. i'll add decimation tomorrow. but just so you can try, if the results are reasonable. ...i'll also probably should wrap all the miscelanneous free functions that are floating around in the same file into some class. i will also need to do some more experiments with myself with real world signals

elanhickler commented 4 years ago

how do I get the two envelopes for the inputs? Can I leave that blank?