RobinSchmidt / RS-MET

Codebase for RS-MET products (Robin Schmidt's Music Engineering Tools)
Other
56 stars 6 forks source link

Spectrogram processing #288

Open RobinSchmidt opened 5 years ago

RobinSchmidt commented 5 years ago

i open a new thread to continue the discussion which started here:

https://github.com/RobinSchmidt/RS-MET/issues/280#issuecomment-525162273

but doesn't really belong to the main topic of this thread. ...so there's a class rsSpectrogram for converting an array of audio-samples into a spectrogram, represented as matrix of complex numbers and/or (re)synthesize an audio signal from such a spectrogram. in between analysis and resynthesis, one can apply arbitrary transformations to the spectrogram data. one of the simplemost things to do is filtering by zeroing out the bins above or below a certain cutoff point. the function spectrogramFilter in the TestsRosicAndRapt project demonstrates how this can be done.

...by writing this test function, i discovered a flaw in the underlying matrix class which unfortunately requires some inconvenient additional copying workaround (there are comments about this in the code). i think, i will soon replace this matrix class - which i wanted to do since some time anyway. i have now other ideas, how a proper implementation of a matrix class should look like (probably next week - i'm a bit sick at the moment)

elanhickler commented 4 years ago

What's an easy way to exponentially reduce the volume of a bin to 0? What equation should I use?

Well I should probably work in terms of time rather than bin id. I'd need to convert a time range to a bin index range.

Also, should I use a sin function to feather the frequency vs bin volume? So you choose a center frequency, then above and below the volume is reduced a bit until 0 as well, so the filter isn't so flat. I might just need like 1 or 2 bins of feathering above and below the target frequency.

edit: I guess time would be numBins / seconds and I can use linear for now. edit: linear fade sounds good.

RobinSchmidt commented 4 years ago

I'd need to convert a time range to a bin index range.

i don't understand that. time and frequency are independent dimensions

elanhickler commented 4 years ago

I got it working. Maybe I should have said numFrames / seconds. Basic linear fade.

RobinSchmidt commented 4 years ago

yes - numFrames/seconds makes much more sense

elanhickler commented 4 years ago

I'll explain what I'm up to. I'm fading out/removing all audio content except multiples of a frequency. That isolates harmonics. It drastically reduces noise and clicks. The problem is that you hear a buzzing sound because there's still noise in the upper frequencies where the instrument's harmonics decay quickly. To solve this I put a sweeping-down lowpass filter so the spectrum decays in a way that sounds natural. It's just guesswork at that point, by ear. But here's the kicker: you could not use a filter on a normal signal because it would actually sound like a sweeping lowpass filter because of broadband noise. It would sound like "shhhhoooowww". But in this case, since everything but harmonics is isolated, using a filter is viable and sounds correct.

RobinSchmidt commented 4 years ago
    // workaround to create the deep copies
    Matrix sl(numFrames, numBins);
    sl.copyDataFrom(s);

meanwhile, i have implemented my new matrix class and i have updated my spectrogram stuff to use it. now you can simply write:

    Matrix sl = s;

the new class is temporarily named rsMatrixNew, so you may have to update the typedef that is not shown your code. eventually, i want to retire the old implementation and then the new class will be just named rsMatrix again. but during transition (while the old class is still in use), i need this temporary name for distinction

edit: just look at spectrogramFilter() in PhaseVocoderExperiments.cpp how to update the code (your code seemed to be partially copied from there). it's much cleaner now.

elanhickler commented 4 years ago

I don't know wtf is going on, but this is really bad. I can't extract harmonics at this crucial time. See code: https://github.com/RobinSchmidt/RS-MET/issues/288#issuecomment-532298095

edit: figured it out, I was resynthesizing the wrong fft object.......................... I don't understand this code too well.

RobinSchmidt commented 4 years ago

I was resynthesizing the wrong fft object.......................... I don't understand this code too well

what is an fft object? do you mean the spectrogram matrix? if you think, the code is hard to use and have suggestions for the improvement of the API, let me know. in fact, i'm contemplating a general overhaul of the API - after watching this (very entertaining) video:

https://www.youtube.com/watch?v=5tg1ONG18H8

my code grew organically over almost two decades now and was not at all consistently "designed" in any way (aside from loosely following some conventions that i made up), so it definitely has its quirks - such that time variables are sometimes in seconds, sometimes in milliseconds (i usually opted for whatever i would display to the user on the gui - which was good enough for personal use but is quirky for client code)

elanhickler commented 4 years ago
Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

// workaround to create the deep copies
int numFrames = s.getNumRows();
int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()        

Matrix sl = s;

this is what messed me up, because I don't understand why we throw away the first matrix.

RobinSchmidt commented 4 years ago

oh - you mean, you want to work directly on the matrix s instead of on a copy sl as in "in-place" processing? sure, if you don't need the unprocessed matrix anymore for anything else, you can totally do that. the code from which you copy-and-pasted this, still needs the original matrix later for plotting - but you may not

edit: oh - by the way - the comment is outdated as well - there is no workaround anymore. the deep copy is now done by the assignment with the new matrix class - which is the solution i wanted

elanhickler commented 4 years ago

I don't get it. Why would you need an unprocessed matrix for plotting? Wouldn't you want to plot the processed version to see the results? And yes, I don't need to plot results. Can I get rid of the 2nd matrix? Also, I want to move numFrames/numBins and vector<int> binsToZero outside of the loop because that only needs to be calculated once.

Here's my code again:

const int channels = origAudio->getNumChannels();
const int sampleRate = origAudio->getSampleRate();
const int samples = origAudio->getNumFrames();
const int bitDepth = origAudio->getBitDepth();
const double maxf = std::min<double>(20000, sampleRate * 0.5);
const int nHarmonics = int(maxf / f);

for (int ch = 0; ch < channels; ++ch)
{
    // compute the complex spectrogram:
    WindowType W = WindowType::hanningZN;
    Spectrogram sp;
    sp.setBlockAndTrafoSize(blockSize, trafoSize);
    sp.setHopSize(hopSize);
    sp.setAnalysisWindowType(W);
    sp.setSynthesisWindowType(W);
    sp.setOutputDemodulation(true);
    Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

    // workaround to create the deep copies
    int numFrames = s.getNumRows();
    int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()        

    Matrix sl = s;

    vector<int> binsToZero;
    for (double cf = f; cf < sampleRate * 0.5;)
    {
        int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
        int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);

        for (int b = lo; b < hi; ++b)
            binsToZero.push_back(b);

        cf += f;
    }

    // zero out harmonic bandwiths to get only noise
    for (int i = 0; i < numFrames; i++)
        for (int b : binsToZero)
            sl(i, b) = 0;

    // subtract noise only from orignal to get harmonics only
    vec x = sp.synthesize(sl);
    auto ptr = harmAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] -= x[s];

    // transfer noise only
    ptr = noiseAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] = x[s];
}

if (noiseOutPath.isNotEmpty())
    noiseAudio->renderFile(noiseOutPath);
if (harmOutPath.isNotEmpty())
    harmAudio->renderFile(harmOutPath);

return var::undefined;
RobinSchmidt commented 4 years ago

Why would you need an unprocessed matrix for plotting? Wouldn't you want to plot the processed version to see the results? And yes, I don't need to plot results. Can I get rid of the 2nd matrix?

because in my test, i plot all three spectrograms - original, lowpassed and highpassed. sure, i could have interleaved the processing and plotting code to get rid of at least of one of the 3 matrices - but this is not production code, so i'd rather focus on readability than efficiency.

Also, I want to move numFrames/numBins and vector binsToZero outside of the loop because that only needs to be calculated once.

i don't see any obvious problems with that - numFrames/numBins seem to be used only once, so you may as well replace their usage directly with the function call. although i think, looping up to numFrames is more readable than to s.getNumRows(), though. matter of taste. i wonder, why you bother with the binsToZero vector at all and not just loop directly for(b = lo; b <= hi; b++)

RobinSchmidt commented 4 years ago

i just renamed rsSpectrogram to rsSpectrogramProcessor - and the member function complexSpectrogram to getComplexSpectrogram. i'm planning to introduce a new class rsComplexSpectrogram which should be a subclass of rsMatrix<std::complex<T>> and will have member function getNumFrames, getNumBins that simply delegate to the getNumRows, getNumColumns function of the matrix baseclass (and the baseclass functions will become unavailable via inheriting protected or private). then you would have something like rsSpectrogramProcessor::getComplexSpectrogram that returns an object of class rsComplexSpectrogram rather than rsSpectrogram::complexSpectrogram returning rsMatrix. that makes much more sense, API-wise

elanhickler commented 4 years ago

good, push changes!

elanhickler commented 4 years ago

code with your suggested changes:

for (int ch = 0; ch < channels; ++ch)
{
    // compute the complex spectrogram:
    WindowType W = WindowType::hanningZN;
    Spectrogram sp;
    sp.setBlockAndTrafoSize(blockSize, trafoSize);
    sp.setHopSize(hopSize);
    sp.setAnalysisWindowType(W);
    sp.setSynthesisWindowType(W);
    sp.setOutputDemodulation(true);
    Matrix mat = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

    // zero out harmonic bandwiths to get only noise
    for (int i = 0; i < mat.getNumRows(); i++)
    {
        for (double cf = f; cf < sampleRate * 0.5;)
        {
            int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
            int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);

            for (int b = lo; b < hi; ++b)
                mat(i, b) = 0;

            cf += f;
        }                
    }

    // subtract noise only from orignal to get harmonics only
    vec x = sp.synthesize(mat);
    auto ptr = harmAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] -= x[s];

    // transfer noise only
    ptr = noiseAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] = x[s];
}
RobinSchmidt commented 4 years ago

i already pushed them. i would also drag out all the setup-work for sp out of the loop over the channels - execept for the last line mat = ... (which is not setup call anyway) - because nothing really changes - you just do all the exact same setup-operations numChannels times when doing it in the loop

elanhickler commented 4 years ago

build error, i fixed it like this... not sure if it's correct.

image

RobinSchmidt commented 4 years ago

oh - ok, yes - thanks. i fixed it. i didn't hit this on "build" - only now on "rebuild". apparently, some minimum-rebuild-cache in VS was messed up.