Open elanhickler opened 4 years ago
so, in your second pic, you have shifted the green signal such that its decay section of the envelope best matches the decay of the other signal?
Yes
can i assume both signals to have the same overall amplitude and decay rate, like in your pic?
uhhhhh.... well both signals will be similar, like a loud hit of an instrument vs a quiet hit. I think signals will always be the decaying type.
hmm...ok - i have an idea: use the log of the envelope (should be roughly linear, assuming exponential decay) -> fit a line to both log-envelopes by linear regression -> from the difference in the y-intercept of both lines, compute an x-shift to make the y-intercepts coincide - we'll see, if this works out....
hmm - ok - this is what my algo does to two exponential decays with different initial amplitudes and matching decay times:
blue: reference signal, green: to-be-shifted signal, red: shifted version of green signal. ...so this works - it has to because it was constructed to work like that. ....next step: figure out, what it does when it encounters situations that do not match the idealized scenario for which it was constructed...
this is what happens when the green decays twice as fast as the blue - it gets right-shifted more:: dunno what would be the most reasonable behavior in cases where the decay-times don't match. in such a case, the notion of fitting the shape becomes ambiguous because the two signal do simply not have the same shape
when green decays half as fast as blue, it even gets left-shifted:
.....but i guess, if it's the same instrument hit at different strengths, the decay time actually will match because its a property of the instrument and not of the way in which it was excited
but - by the way - what is actually the purpose of such a match? it doesn't really strike me as useful to shift the quiet hit forward in time to match the decay of the loud hit. i mean, if you prepare a sample-set for a sample based instrument, you want the sound to start when you hit the key - not some time later,...
i just had another idea: as it stands, the algorithm will be based purely on the envelope - maybe that's good enough, but maybe you also want the phase of the shifted signal to be aligned with the reference signal? in this case, we may use this algorithm above to find a rough ballpark value for the shift and then refine it by cross-correlation of the actual signals (not just look at their envelopes) - this refinement should change the shift-value by at most half of a cycle-length
ah - it's now clear to me, why the non-matching decays are shifted the way they are: the algorithm contains a user adjustable parameter, the "match-level", which gives the level (in decibels) at which the two envelopes are forced to match. i have set this to -20 dB in the experiment which is why the blue and red curves meet at 0.1. so they meet at the 0.1 level, because i said so. and if the decays in fact do match, the match at one point implies that the whole curve must match
if the envelope is plotted in the db domain, the exponential decays become lines. if the decay times are the same, the lines have the same slope and the algorithm brings them into coincidence along the whole line. if the two lines have different slopes, it brings them into coincidence at one point which is determined by the match-level. that's the only thing you can do with a shift with two lines that aren't parallel, so that's probably a reasonable behavior in the non-ideal case
...that's the only thing you can do with a shift with two that aren't parallel,...
assuming, shifting is the only thing, we can do. of course, we could also change the slope, if we wanted to. that would amount to modifying the decay time of the 2nd signal to match the decay time of the first. ....of course, this goes beyond what you have ordered but if you consider that a useful thing to do, i could figure out the details
here is a more realistic scenario where the two decays are not too wildly different:
That looks good. I've not encountered a situation that would require modifying the decay for a better match. You can add phase alignment but I already have that capability and intend to use it. If you added it into this algorithm that would be one less step for me, either way it's fine.
On Tue, Oct 1, 2019, 9:54 AM Robin Schmidt notifications@github.com wrote:
here is a more realistic scenario where the two decays are not too wildly different:
[image: image] https://user-images.githubusercontent.com/1833099/65982912-a8c3e980-e47c-11e9-9cec-f46220ebb544.png
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RobinSchmidt/RS-MET/issues/293?email_source=notifications&email_token=ACFRXH2X34HIDDSWU4QUP73QMN6F3A5CNFSM4I3VZL42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAB6U2I#issuecomment-537127529, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFRXHZQZROJOGQK6KYUZULQMN6F3ANCNFSM4I3VZL4Q .
The purpose of this is to record one full length hit sample, glue the tail of that on to short recordings of hits. You can get many round robins / repetitions without having to record full length samples for every hit.
so, this is meant as a time saver for recording sessions? and/or as a space saver for sample based instruments - as in splicing together a single full decay sample with various different attack samples?
Yes
nice! i like the idea of sample-based instruments that don't eat up gigabytes of data. i'm actually also quite fond of the idea of combining attack transient samples with a synthesized "body" - as roland did in their LA synthesizer series ("linear arithmetic" - for whatever reason). ...and i actually think, that my modal synthesis filters are perfectly suited for synthesizing such a body - at least for decaying sounds
ok - there's a new class in rapt, called rsExponentialEnvelopeMatcher
. it's not yet finished, though. i'm planning to incorporate a few more user parameters that let you set it up such that it may ignore the initial and final portion of the sample for computing the regression coefficients - assuming they contain the attack and noisefloor respectively and would potentially distort our regression measurements. ah - and yes, i'm planning to implement the phase match as well
i'm planning to incorporate a few more user parameters that let you set it up such that it may ignore the initial and final portion
done. i've also included the option to ignore samples below a given level threshold.
edit: oh - and to use it, you'll have to have an instantiation of the template somewhere or you'll get linker errors - but you already know that game. i don't instantiate it myself in rosic at the moment
Hmm, this is generally not working for real world signals as well. I think we need more of an overall measure of the amplitude over time and do something like a amplitude curve match similar to how to waveform matching / cross-correlation thing works.
this is what i get:
this is what I expect:
You put the "shorter" signal in the first argument, right?
(0096)DPan_MutedMalletsEastn=D2
tail=3.zip
(has the two audio files 44100 16b mono)
You put the "shorter" signal in the first argument, right?
you mean, here?
T getMatchOffset(
const T* referenceEnv, int referenceNumSamples,
const T* shifteeEnv, int shifteeNumSamples);
well, no - the longer signal is the reference signal (which stays fixed), and the shorter one is the "shiftee". i'll have a look tomorrow within my test codebase. by the way - which envelope extractor do you use? one thing, i see is that the shape of the envelope does not actually look very exponential. it looks more linear. maybe that could be the reason and we need an algorithm that makes less assumptions about the shape.
cross-correlating envelopes...yaaa - that was the first thing that came to my mind, too. i think, i didn't follow that because i would expect the cross-correlation to be more or less like an plateau for the entire length of the envelope (the "shape" matches everywhere just the same)...just the overall level is different. a normalized cross correlation should be invariant with regard to overall signal level ...or maybe i have that wrong? ...i need to try.... but anyway, we do actually have the overall level and can use that info for matching as well - which is in fact - the most obvious thing to do. ...i was also thinking about using a similarity measure based on the time-averaged difference of the envelopes (maybe also suitably normalized with respect to overall level).
oh - wait - you mean, because in my code comment i say:
// compute, by how much we must shift x1 to match x2 at the given match level:
oh - i guess, that comment may be wrong then - will verify....
hmmm - here is what i am getting:
with the following settings:
RAPT::rsExponentialEnvelopeMatcher<double> em;
em.setMatchLevel(-50);
em.setInitialIgnoreSection1(60000);
em.setInitialIgnoreSection2( 6000);
em.setIgnoreThreshold1(-70);
em.setIgnoreThreshold2(-70);
i used large "initial ignore sections", especially for the 1st (reference) signal in order to skip the initial silence section and the two initial decay sections. it seems to have a 3-stage decay...or some sort of decay1-decay2-sustain-decay3 shape - and moreover, the decay-sections look more linear rather than exponential. ....my result actually looks like the shape matches but the blue signal should just be shifted downward, i.e. be some 5 or 6 dB quieter. are both signals actually of the overall same level? (i've plotted the envelope on the dB scale)
...wait! i just found a stupid bug....
i just found a stupid bug
the user specified the ignore-levels in dB and i was checking against linear amplitudes without converting (rendering the ignore-threshold settings ineffective) - this is fixed now. it's getting better - here, i have cut out a section from the tail of the original sample and tried to re-match it with the full signal:
settings: ignore1: 60000, ignore2: 1000, threshold: -65, match-level: -55. the jagginess is because i'm using a simple envelope follower with zero attack (and the plots use naively decimated data - gnuplot doesn't like big datasets)...hmmm...it's still a bit too far to the left...
this is what happens now (with the same settings) with the other tail section that doesn't come from the original full signal:
The last two results look good.
I'll be dealing with some pretty quiet to loud signals. It won't be practical for me to set match level (maybe I'll set ignore to -90). Match level should be dictated by the "short" signal which needs a tail. The short signal will be a piece of audio and it won't require ignoring sections.
I'll see how far I get with this.
Match level should be dictated by the "short" signal which needs a tail
wait - so far, the short signal was the tail and the long signal was the full length signal with attack and tail. are you now talking about another scenario where there's only a short attack sample to which a (longer) tail should be matched?
GOOD
and I'm getting 90% bad results.
before processing:
after processing:
wtffffffffffffff?
I'm getting 90% bad results.
what about the other 10%? can you identify signal features that they have in common which the bad 90% don't have?
The first issue is that I'm getting insane time offsets, larger than the given files.
a 1 second compared to an 8.5 second file gave an offset of -31 seconds. I confirmed that my samplerate was 96000 as well and all is working in that regard. For a sanity check I get +31 seconds when swapping the inputs (reference vs shift)
I'm supposed to be using the same rsExponentialEnvelopeMatcher?
made sure i got the latest update
yes - that code looks right. that was my stupid bug
hmm...well...i could sanity-check and clip the results - but that wouldn't solve the underlying problem (but maybe it's a good idea anyway? dunno). should i try these files myself?
All samples (tail=1) need to be matched to the single wav file (tail=3)
which are the ones where you get this 31 sec shift?
with this pair:
testEnvelopeMatching(
"MutedMallets/(0042)DPan_MutedMalletsNorth`n=D2`tail=3",
"MutedMallets/(0034)DPan_MutedMalletsNorth`n=D2`tail=1");
i get this result (using the exact same settings as i said above)
...which looks quite good. did i pick the wrong pair? btw: the sampe-rate of the files is 44100
ok - this here:
testEnvelopeMatching(
"MutedMallets/(0042)DPan_MutedMalletsNorth`n=D2`tail=3",
"MutedMallets/(0031)DPan_MutedMalletsNorth`n=D2`tail=1");
is weird: (still "in range" though). but here, we are not really dealing a decaying tail type of envelope. that could easily confuse the algo
but, of course, if your tail envelope is not really of a decaying shape at all - such as this one - you can get anything. imagine an almost flat envelope that's flat somewhere around -50dB....and you try to find the point where it crosses -20 dB or something - that could be faaar off and out of range.
what is your match-level setting?
ok - i think, 0031 is really a good example for a bad sample and 0034 a good example for a good sample. with 31, you are trying to match an envelope that is actually quite flat on the average (due to the 2nd bump) to the decaying envelope of the reference signal. the algo expects a situation like the one with 34, where you have a clearly decaying nature in the tail env
Having to set match level is impractical.
It looks like we cannot at all expect a decaying envelope. Instead we have to match two random shapes. The match is wherever they most cancel out I think, like cross-correlation.
The 31 second result is from 31/42, but at the new sample rate 44100 the results are different, equally as bad. None of those samples give satisfactory results.
Having to set match level is impractical.
well, we could perhaps let the algo automatically pick it based on the average level of the tail signal. ...but if
we have to match two random shapes.
that's probably not worth to pursue.
I think, like cross-correlation.
yes - but bog standard cross-correlation would probably not be suitable, due to its dependence on overall level. for the two exponential decays that i used as my first example, i would expect the cross-correlation function to have a maximum at zero shift. it's just the sum of the products of signal values for a given amount of shift - and that sum of products would be largest for a zero shift with this example - just because the reference signal has higher overall level at the beginning. ...sooo, it would need some suitable normalization with respect to overall level (edit: but even with normalization, the cross-correlation would probably have a plateau. ).....and the normalization may rule out a fast, FFT-based cross-correlation algo. ....but maybe not - i have to check the math.
but anyway - i think, it's overkill to use envelopes at the full sample-rate anyway. so, if we have to resort to a slow O(n*m) algorithm (n, m being the lengths of the envelopes), we can probably make it practical by decimating the envelopes. anything more than 1 datapoint per cycle does not give us any useful additional information about the envelope anyway. it didn't really matter much with my current algo because it's O(n+m) and the decimation itself would have the same complexity, so i just used full sample rate envelopes. at the moment, i tend to think, the sum of the absolute value (or maybe the square) of the difference of both envelopes (as function of the shift) might be a suitable similarity measure ....i'll have to do some experiments....
here are similarity measures based on sum of absolute and squared differences. the minimum is clearly identifiable (especially for the absolute value) and exactly at the position of the best match: (the input was my pair of decaying exponentials again - but the algo doesn't rely on that). sooo - that looks promising. i must leave now - i will add the code to the library tomorrow
Ah, so once you add this to the library, I can try again to see if I get some good results?
once you add this to the library, I can try again to see if I get some good results?
yes - there's a new function getBestMatchOffset
in MiscUnfinished.h - but it's still preliminary. it should work but it still lacks the decimation feature, so it's probably very slow, unless you give it decimated envelopes yourself. it expects the two envelopes and their lengths as inputs and returns the best shift (it's the position of the minimum/cusp of the black curve above) in samples. i'll add decimation tomorrow. but just so you can try, if the results are reasonable. ...i'll also probably should wrap all the miscelanneous free functions that are floating around in the same file into some class. i will also need to do some more experiments with myself with real world signals
how do I get the two envelopes for the inputs? Can I leave that blank?
Given waveform A and waveform B, what is the negative time offset for B in which A and B share common amplitude?
waveform A and B
example time offset for match
Add it to my tab, can you get this done within 7 days? Thank you! Edit: Actually I will pay for this work right away.