RobinSchmidt / RS-MET

Codebase for RS-MET products (Robin Schmidt's Music Engineering Tools)
Other
57 stars 6 forks source link

OrangeTreeSamples needs harmonic extractor and custom resynthesis #225

Open elanhickler opened 6 years ago

elanhickler commented 6 years ago

SampleTailExtender.zip

Explore the code base, you'll easily find the harmonic analysis and extraction stuff. We need you to repurpose this for extracting individual harmonics.

Right now the goal is to extract harmonics and measure the amplitude decay to then resynthesize it, except that the amplitude decay needs to be smoothed out because there's unwanted harmonic beating or distortion that is causing unwanted modulation.

elanhickler commented 6 years ago

Here is an example extracted harmonic that has unwanted amplitude modulation and I created a desired envelope by hand:

image

RobinSchmidt commented 6 years ago

so, the goal is to extract the amplitude envelope and remove the beating artifacts? can we assume that the envelope is an exponential decay, i.e. can we just model the decay of the mode with a decaying sinusoid? in this case, the task would just boil down to estimating the decay time constant which can easily be done by comparing the (average) amplitudes of a chunk at the start with a chunk at the end

elanhickler commented 6 years ago

You can't assume exponential decay. It will be some kind of curve, you'd have to do some kind of exponential decay curve fitting if anything, you could try taking the average amplitude per window of time to create an amplitude envelope and doing a curve between the points.

RobinSchmidt commented 6 years ago

ok - i created myself an experimental setup with two attack/decay sinusoids with slightly different frequencies (and if i want, also different amplitudes and decay times). if i know the center frequency, i can extract a pretty good envelope (in blue): image ....now on to try to remove the beating from that envelope...

RobinSchmidt commented 6 years ago

a simple approach would be to just sort of take the envelope of the envelope, like this: image currently, i just connect the env-env points by straight lines (green), but i think, i'll write a nice general class for interpolating (x,y) datapoints by various methods (linear, splines, etc.). then we can pick the most suitable method. i guess a cubic spline in log-amplitude will probably give nice, natural looking envelopes. i mean: take the log - fit a spline - use exp to undo the log...or something

RobinSchmidt commented 6 years ago

to take the first envelope, i just use the peaks of the rectified signal (if the signal is more or less sinusoidal, this should be good enough - for more complex waveforms, one would have to be more careful) and then i use peak-picking on the result of the first peak-picking, where a "peak" is defined to be any value which is larger than its left and right neighbours

gregjazz commented 6 years ago

Here are some problematic samples that we would be using the harmonic resynthesis for:

https://www.dropbox.com/s/0w6pwm193m42ho9/Harmonic%20Reconstruction%20-%20Problematic%20Examples.zip?dl=0

I included a screenshot so you can see the problematic frequency area as well as a text file explaining the issue. Separate examples are included for the Rhodes instrument as well as the Strat samples.

RobinSchmidt commented 6 years ago

ok - here is what my envelope de-beating algorithm does so far: image i have an idea how i may reduce the bump at the beginning...

gregjazz commented 6 years ago

Looks promising so far! Any update on progress with this? And don't forget to bill me for all this work. :)

RobinSchmidt commented 6 years ago

i'll wrap the code into a convenient class or function tomorrow, so you can try it yourself...

RobinSchmidt commented 6 years ago

ok, i added a class rsEnvelopeExtractorto my RAPT library. it has a static function sineEnvelopeWithDeBeating which you can call like:

RAPT::rsEnvelopeExtractor<double>::sineEnvelopeWithDeBeating(x, N, env);

where x and env are arrays of doubles of length N, x is the input signal, the envelope will be written into env. if you give it the black signal in the pic above as x, it will write the blue signal into env (the green signal is an intermediate non-de-beated envelope). if you want to work with std::vector, you could call it like:

env.resize(x.size()); // make sure that env has same length as x
RAPT::rsEnvelopeExtractor<double>::sineEnvelopeWithDeBeating(&x[0], (int) x.size(), &env[0]);

i'm still not finished with it - i need to add comments how to use the class, and i'm also not yet happy with how the endpoints are handled. i think, it would be better, if the envelope starts and ends at 0. more work to do for tomorrow...

edit: maybe better wait until tomorrow before you test it - i may have to explain in some more detail how to use my libraries, etc. but must leave now

gregjazz commented 6 years ago

Awesome--this will be so useful!

RobinSchmidt commented 6 years ago

i'm trying to implement "natural" cubic splines: http://www.maths.nuigalway.ie/~niall/teaching/Archive/1617/MA378/2-2-CubicSplines.pdf my current cubic polynomial is defined a little bit differently. instead of requiring the second derivative to be continuous at the datapoints, i prescribe values for the first (which i set from the data using a finite-difference approximation of the 1st derivative). i think, the natural spline will be better not only because it also has a continuous 2nd derivative (which my spline has not) but also because it will (probably) deal with the endpoint problem in a natural way...we'll see

RobinSchmidt commented 6 years ago

ok - it seems the basic natural spline code works (it's asymmetrical because on the left ther are less datapoints): image ...but when i try to apply it to the envelope, i run into a singular matrix. more tests needed...

RobinSchmidt commented 6 years ago

OK - it seems to work (mostly). there's a class rsEnvelopeExtractor in my library now. to use it, you would create an object of that class, set it up and then call a function sineEnvelopeWithDeBeating. example code could look like this:

  typedef RAPT::rsEnvelopeExtractor<double> EE;
  typedef RAPT::rsInterpolatingFunction<double, double> IF;
  EE envExtractor;
  envExtractor.setInterpolationMode(IF::CUBIC_NATURAL);
  //envExtractor.setInterpolationMode(IF::LINEAR);
  envExtractor.setSampleRate(fs);
  //envExtractor.setSmoothing(20.0, 4);
  envExtractor.setStartMode(EE::FREE_END);
  envExtractor.setEndMode(  EE::FREE_END);
  //envExtractor.setStartMode(EE::ZERO_END);
  //envExtractor.setEndMode(  EE::ZERO_END);
  //envExtractor.setStartMode(EE::EXTRAPOLATE_END);
  //envExtractor.setEndMode(  EE::EXTRAPOLATE_END);
  envExtractor.sineEnvelopeWithDeBeating(&x[0], N, &env[0]);

you see some commented code there just to conveniently switch between certain options by un/commenting lines of code. the last line would actually extract the envelope. as said, x and env are both arrays (or vectors) of length N. for general use, i think, i would recommend the natural spline setting with free ends. clamping the ends to zero does not seem to work well in general because if the first (and/or last) peak in the envelope is very close to the start (or end), the spline tends to overshoot a lot. with a linearly interpolating envelope, this is not an issue (because it cannot overshoot by nature) - however, then the envelope is not very smooth.

i'm considering some refinements such as smoothing the result by a bidirectional filter and/or add an adjustable fade-in/out and there may still be pathological cases that i need to catch and treat specially (for example, when there is no actual beating present in the "raw" envelope).

elanhickler commented 6 years ago

Great! We need the harmonic extractor as well. And that can also be used to improve on phaselocking. The results with a proper FFT harmonic extraction are much better that the bidirectional filter solution you created a while ago. Remember, we have working code for the harmonic extractor.

RobinSchmidt commented 6 years ago

Explore the code base, you'll easily find the harmonic analysis and extraction stuff. We need you to repurpose this for extracting individual harmonics.

we have working code for the harmonic extractor.

soo - what should i do with that code? if it is already working as you need it to, it would seem that there's nothing to "repurpose" and can be used as-is? but apparently that's not the case? i see the HarmonicAnalyzer class - what exactly does it do in this context and in which way does it fall short? and what about copyright issues? should i include this code into my codebase (in a special third-party section)?

elanhickler commented 6 years ago

We need a function to extract harmonics and replace harmonics, FFT does this perfectly in my experience, and again, your bidirectional filters are bad at it, not worth investing any more time in the bidirectional filter idea for extracting harmonics.

Here we have a harmonic detector function that takes a "magnitude spectrum" which is created via an FFT function. I think it works by looking at each magnitude spectrum index and seeing if the -1,-2,+1,+2 indexes are quieter than the reference index 0.

std::vector<int> SampleTailExtender::getIndicesOfHarmonics (const std::vector<double>& magnitudeSpectrum, double sampleRate)
{
    std::vector<int> harmonics;

    double maxFrequency = 10000.;
    int maxBin = (int) maxFrequency / (float (sampleRate) / float (audioFrameSize) );

    for (int i = 2; i < maxBin; i++)
    {        
        bool condition1 = magnitudeSpectrum[i] > magnitudeSpectrum[i - 1];
        bool condition2 = magnitudeSpectrum[i] > magnitudeSpectrum[i - 2];
        bool condition3 = magnitudeSpectrum[i] > magnitudeSpectrum[i + 1];
        bool condition4 = magnitudeSpectrum[i] > magnitudeSpectrum[i + 2];

        if (condition1 && condition2 && condition3 && condition4)
            harmonics.push_back (i);
    }

    return harmonics;
}

Here's an example of creating the magnitude spectrum:

std::vector<double> audioFrame; // one frame of audio data

FFT fft (audioFrameSize);
fft.performFFT (audioFrame);

std::vector<double> magnitudeSpectrum = fft.getMagnitude();

Here's an example of comparing magnitude spectrum to decibels.

double decibelThreshold = -60.0;
double threshold = pow (10.0, decibelThreshold / 20.0);
bool condition = magnitudeSpectrum[i] > threshold;

I could probably figure out how to make my own set of FFT functions for harmonic extraction, resynthesis, and re-adding the new harmonics to the original audio, but I'd like you to include this in your library so you can maintain the functions and add things we will want in the future.

RobinSchmidt commented 6 years ago

I think it works by looking at each magnitude spectrum index and seeing if the -1,-2,+1,+2 indexes are quieter than the reference index 0.

yes. it selects local maxima from the magnitude spectrum where a maximum is considered to be defined by being louder than its two neighbours to each side. i'm just a bit confused what to do with the code. should i take it as inspiration or copy into my codebase for tweaking or both? if so, i would suggest, i place it into the ThirdParty folder (in Libraries) and set up a little testbed in my test code section, so we can experiment with it. you say, any copyright issues are cleared, right?

elanhickler commented 6 years ago

we own the code out right.

This code is for you to give us a working harmonic extractor in RS-MET library and continue from there.

RobinSchmidt commented 5 years ago

ok, i have integrated the code into my codebase and have set up an experiment that tests the SampleTailExtender. if you run the test project, you can try it yourself. you will get two wavefiles in the project directory - one is a sort of synthesized pluck sound (via my modal filter bank) and the other one is the same sound with the tail extension taking over at 1.5 seconds.

www.rs-met.com/temp/TestPluck.wav www.rs-met.com/temp/TestPluckExtended.wav

you can clearly hear the sound becoming more "static" as the extender takes over due the fact that from that instant on, it's only getting more quiet over time but not duller (as it initially does and should continue to do).

there are two (related) things that i would suggest to improve: 1: don't let the user set the decay-rate - instead measure it from the sample-data before the splice-point 2: don't use the same decay rate for all harmonics - each harmonic should have its own decay rate (they are related because it would probably be very impractical to let the user set dozens or even hundreds of decay rates)

gregjazz commented 5 years ago

Yes, each harmonic having its own decay based on a per-harmonic estimated decay rate is ideal, and that's what will help the audio from sounding static once it reaches the extended portion.

elanhickler commented 5 years ago

I thought the code already did that based on a decay rate adjustment. The decay rate is a modifier, not a fixed decision.

gregjazz commented 5 years ago

I thought so, too, but it doesn't sound like it's doing that. Maybe something is going on in the original code to prevent that from working properly?

RobinSchmidt commented 5 years ago

hmm...maybe i have an older version of the code? it certainly looks like the amplitude envelope is generated only once (before the loop over the harmonics) and then applied to all harmonics

    // stuff
    std::vector<double> decayEnvelope = makeDecayEnvelope (splicePointPeak, sampleRate, decayRate);
    // stuff..
    for (auto& h : harmonics)
    {   
            // ... more stuff - synthesize sinusoid without envelope...
            for (int i = 0; i < synthesisedSignal.size(); i++)
            {
                synthesisedSignal[i] += sinusoid[i] * (decayEnvelope[i] * mag);
            }
        }
    }

there is actually some second envelope that is applied to each harmonic separately but that second envelope seems to account only for the beating not for the decay. ...if i interpret the code correctly. (the code is from SampleTailExtender::extendSample)

RobinSchmidt commented 5 years ago

that second envelope seems to account only for the beating not for the decay

so it looks like the code was written to explicitly reproduce the beating

gregjazz commented 5 years ago

Ahhh, that explains it. I looked through our conversation, and the original programmer mentioned making the decay rate different per harmonic, but then we focused on the harmonic beating effect, so I think that was forgotten about.

RobinSchmidt commented 5 years ago

i have started implementing a sinusoidal modeling framework. it's based on a data structure that has for each partial an array of data points, each of which consists of a time-instant, instantaneous frequency, instantaneous phase, and instantaneous amplitude. i wanted the data of the model to be independent of any arbitrary technical parameters like a sample-rate or frame size or anything like that. the synthesis side of it seems to work already - but that was the easy part - although it was a bit tricky to combine the instantaneous freq and phase data. next comes the analysis part...

RobinSchmidt commented 5 years ago

oookay - i have a basic(!) sinusoidal analysis algorithm up and running. now is testing and tweaking time

elanhickler commented 5 years ago

Maybe your first real world test can be separating noise from harmonics atonal from tonal, spit out two files.

RobinSchmidt commented 5 years ago

tonal in the sense of "stable frequency sinusoid", regardless of whether it's harmonically related to some fundamental frequency or not? and "atonal" would be transients + noise? generally, yes - i'm planning to use it for splitting audio data into complementary parts (actually, i want to split into harmonic + inharmonic + transient + noise).

elanhickler commented 5 years ago

Having 4 parts is nice too. Remember that plucked instruments like guitar do not have harmonics that are perfect multiples of the fundamental. I wonder if electric piano is like that as well. A harmonic is found in plucked instruments if the partial decays more or less and does not have a bunch of disappearances and reappearances. There will be some due to imperfections of the real world.

I need harmonic extraction for my phaselocking stuff as well. Basically we throw out phase and decide a frequency and starting phase for each harmonic.

elanhickler commented 5 years ago

can you get it done a week from now?

RobinSchmidt commented 5 years ago

You just implemented fft harmonic extraction. I'm assuming now everything is ready for phaselocking. Extract harmonics then rebuild them with fixed frequency while using original amplitude.

what i have just implemented is a very basic sinusoidal analysis algorithm, i.e. an algorithm that looks at a spectrogram and from that tries to extract frequency-, amplitude- and phase-trajectories of the component sinusoids. i was assuming a quite general setting, i.e. not necessarily harmonic frequency ratios, possibly time-varying frequencies. but it's still very unrefined. iff the sound can assumed to be (more or less) harmonic, then things are much simpler and i think, i can make a phase-locker in a week from that. for inharmonic sounds, i wouldn't know what phase-locking is supposed to mean anyway. so you just want to obtain amplitude trajectories from an STFT/sprectrogram analysis and use these for resynthesis with fixed harmonic frequencies (and hence, fixed, phase relationships)? btw. i just discovered this:

https://www.coursera.org/lecture/audio-signal-processing/sinusoidal-model-1-gjiP7

and enrolled for the course. seems to be exactly spot on for what you currently need

elanhickler commented 5 years ago

The content I need it for currently is content that we can assume has multiples of the fundamental, and the harmonics will be rebuilt with perfect multiples, perfectly flat frequency, unchanging phase. The only thing that changes is the amplitude, based on the original.

We may need to deal with inharmonic content at some point as well.

elanhickler commented 5 years ago

I can give you another week on getting your library updated with the latest JUCE.

RobinSchmidt commented 5 years ago

yes - i had to revise the interface of my spectrogram class to make it fit better with the parametrization used in Serra's course and also more convenient to use. my original code was a direct translation from my very non-object oriented python code - with functions with loads of parameters. that was messy. ...now everything works with obj.setThis, obj.setThat, obj.setSomethingElse, obj.setYetAnotherParameter - obj.doStuff instead of doStuff(this, that, somethingElese, yetAnotherParameter...)

elanhickler commented 5 years ago

Ok just let me know when your juce is updated so we can get back in sync

RobinSchmidt commented 5 years ago

i'm having some strange issues with the spectrogram resynthesis - but i think, i'll ignore them for the time time being and just continue with the sinusoidal model (we don't really need spectrogram resynthesis for that)

elanhickler commented 5 years ago

what is spectrogram resynthesis? edit: i mean the thing I really need is the phaselock system start to finish as soon as possible. (update your juce!)

  1. extract harmonics
  2. resynthesize harmonics at perfect multiples flat frequency keeping amplitude
  3. paste the new harmonics back into the original audio (minus original harmonics)
RobinSchmidt commented 5 years ago

by spectrogram resynthesis, i mean: take the spectrogram of a signal -> resynthesize the signal from the spectrogram. it does work in some sense - i do get an identity analysis-resynthesis roundtrip (up to roundoff error) - but not yet exactly the way, i want it to

elanhickler commented 5 years ago

isn't that a crucial part of phaselocking, the round trip? or no?

RobinSchmidt commented 5 years ago

no - that's why i said, i may ignore it for the time being. i'm planning to do the (re)synthesis for the sinusoidal model with an oscillator bank (sort of) - so we really only need the analysis side of the spectrogram processor at the moment

elanhickler commented 5 years ago

but we need the original with harmonics removed based on the analysis in order to place new harmonics, so you would need to get the new audio based on the new spectrogram

RobinSchmidt commented 5 years ago

i'm planning to remove the resynthesized harmonics from the original by doing time-domain subtraction. so the whole system would work like:

analysis: original signal -> spectrogram -> sinusoidal model resynthesis: sinusoidal model -> original harmonic signal residual: original signal minus original harmonic signal synthesis: transformed sinusoidal model -> transformed harmonic signal result: residual + transformed harmonic signal

there is no spectrogram synthesis step involved - only a synthesis directly from the sinusoidal model (via oscillator bank)

the "transformed sinusoidal model" is where you can apply phase-locking or whatever

RobinSchmidt commented 5 years ago

but at some point, we may want to get the spectrogram synthesis part straight anyway - to be able to "draw" in spectrograms. i'm rather curious how image processing algorithms (blur, sharpen, contrast, edge-detection, whatever) sound when applied to spectrograms. ...i know, there are already softwares that do that kind of stuff - but i want to have such things in my library, too

elanhickler commented 5 years ago

Time domain subtraction with bidirectional filters? That's what I want to stop using. But wait, your bidirectional filters wouldn't even be able to remove the harmonics based on the analysis due to the frequency-deviation inherent in acoustic samples and in the analysis, so actually I don't know what you're planning... With the resynthesis code I gave you for the SampleTailExtender, the round trip is satisfactory, dare I say perfect.

Edit: I'm pretty sure there's the spectrogram roundtrip...

RobinSchmidt commented 5 years ago

Time domain subtraction with bidirectional filters?

no. time domain subtraction of the output of the unmodified sinusoidal model

bidirectional filters wouldn't even be able to remove the harmonics based on the analysis due to the frequency-deviation inherent in acoustic samples

indeed. i think this is the shortcoming of the bidirectional filtering approach over spectrogram analysis. it can't deal with partials that have a time varying frequency.

With the resynthesis code I gave you for the SampleTailExtender, the round trip is satisfactory, dare I say perfect.

unless i'm missing something big, the sample tail extender has no spectrogram resynthesis at all. it estimates the decay times and beating frequencies and -amounts of the partials and at some time instant in the sample switches/crossfades between the original signal and a synthesized signal that consists of sinusoids with the estimated decay envelope and beating modulation applied (where the synthesis is done via an oscillator bank).

elanhickler commented 5 years ago

ok, carry on.

elanhickler commented 5 years ago

pleeez Robin do you have an ETA? This is the most important thing you will do for me till the end of time!