Open RobinSchmidt opened 4 years ago
Thanks for doing this.
Can you explain how blocksize, hopsize, trafosize would effect the filter?
Edit: Looking at this I can't see where the waveform output is... oh you renamed vector to vec.
for filtering, the trafoSize should probably stay fixed at being equal to the blockSize. i think using zero-padding (i.e. trafoSize > blockSize) makes sense only for higher resolution analysis and visualization. it's actually a faux increase in frequency resulution - the spectrum is still washed out, you just get more spectral samples of it. hopSize = blockSize/2 is probably also best left as is. you can get similar faux time-resolution increase for visualization by choosing smaller (power-of-two) fractions here. the only parameter that really matters is the blockSize - it dials in a trade-off between time-resolution and frequency-resolution of the filter. it basically means that longer block-sizes lead to longer ringing filters that give better frequency separation. so the parameter here is the blockSize - the other two are dependent as: trafoSize=blockSize, hopSize=blockSize/2. ...i mean, you can experiment with other settings (i didn't, so far), but i would not really expect much improvement or difference from doing so
You have lowpass and highpass here. Wouldn't it be just as good to only do one and then subtract the result from the original?
yes, that should work, too. i'm currently just not sure, if the subtraction will work correctly on the spectrogram level due to the flaws and/or bugs in the matrix class (i mean, if you just subtract spectrogram matrices using the "-" operator of the matrix class). but subtracting time-domain signals should work
ok so something I kinda get but also totally don't get.
You have two classes, the spectrogram and the sinusoidal model stuff.
Let's say I want to create a simple denoiser. 0 out bins that are not above an amplitude threshold. Let's say I want to improve that denoiser by detecting harmonics. Let's say after x seconds the harmonics of the signal start dropping below the noise floor. This is where noise reduction start to become useless. What I really want is to do a guestimate of how long the harmonics last and just resynthesize from scratch. Get rid of everything after X seconds and just resynthesize from there. That's where the sinusoidal model comes in? Is it confusing to combine these two classes into a function? How's this going to work, etc. etc. This is just brainstorming.
Edit: This is basically the function of the sample tail extender but as you said, it doesn't use sinusoidal models? It only uses spectrogram stuff and bin amplitude? So it's not as good?
quick question: is it ok to use float types with rsSpectrogram? Because the audio is in floats not doubles.
is it ok to use float types with rsSpectrogram?
i didn't try that but don't see, why it should be a problem. you should just need an appropriate template instantiation
What's the demodulation thing about? What is modulation in this case?
I get some distortion at the end of the file using the lowpass/highpass stuff. I resized my audio samples so that it has blocksize number of extra 0s at the end. That seemed to fix it.
Edit: adding a single extra 0 sample seems to work as well. maybe the last number just needs to be 0.
What's the demodulation thing about? What is modulation in this case?
before analyzing a frame, an analysis window is applied and after (re)synthesizing a frame, a synthesis window is applied. depending on the choice of these windows (and the ratio of blocksize and hopsize), there may be an amplitude modulation in the resynthesized signal - but this modulation is predictable and can be compensated for. this is the demodulation step.
see here: https://ccrma.stanford.edu/~jos/parshl/Overlap_Add_Synthesis.html
in most implementations of spectrogram processing systems, the window-functions and blocksize-to-hopsize ratio is tuned such that this modulation does not occur (this happens when the (overlapped) products of analysis and synthesis windows sum up to unity). but i wanted to have more freedom in my choices of window-functions, so i incorporated this demodulation step. this will do nothing (i.e. divide by one), in case of a choice where this overlap-to-unity condition is satisfied. in other cases, it divides by the sum of the overlapped window-products
I get some distortion at the end of the file
:-O what is this?! i've never seen that. what are your settings? maybe i should try it with your sample? or (easier for me to check) can you produce this artifact also with a simple artificial input signal (noise, dc, sine, whatever?)
maybe the last number just needs to be 0.
this would be really weird!
oh - i just noticed that the hopsize should be blockSize/4 and not blockSize/2 for the overlapped windows to sum up to a constant (with the Hann window). blockSize/2 would be appropriate only if the window would be applied once - but it's applied twice (in analysis and synthesis) - but with halving the hopsize, we again get a sum-up-to-constant property. the hann window is really nice in this respect.
so, with blockSize/2, the demodulation actually does something - which could produce artifacts. ...although probably not of the kind that you are seeing at the end. that is probably something else.
i have checked in an update - my experiment now also plots the sum of the overlapped window products - in case you want to take a look and play around with it
wtf, when I combine two files using JUCE classes the result is that there's errors around -135db. If I combine using REAPER, it is -inf, so, perfect. WHYYYYYY
it's simple math!
Edit: Hmm what if the audio file writer is adding dither? No that can't be, then it wouldn't combine well in REAPER.
I changed hopsize to blocksize / 4 and removed my "fix" for the distortion at the end. I'm not getting any distortion now, I'll wait for it to happen again. It's an intermittent problem. Sometimes happens, sometimes doesn't (if I don't have my fix implemented).
It's an intermittent problem. Sometimes happens, sometimes doesn't
could it be related to the length of the buffer/file? if it's a "nice" number (maybe that an integer number of blocks fits in or something), it doesn't happen and otherwise it does? or the other way around?
is the juce sample buffer single precision and the reaper double and we see roundoff error here or something? anyway, this is not related to my code, right?
btw - i actually would - at the moment - recommend to prepend and append a blocksize worth of zeros before analysis and cutting it off after resynthesis (half or maybe even a quarter of that should actually suffice but better be safe). because my block overlapping may produce fade-in/out effects at start and end (which you won't ever see because of the demodulation - they'll be compensated too - but it's probably better not having to compensate for anything)
...the spectrogram stuff is still very much under development
ah - by the way - in the function plotOverlappingWindowSum, i have made a plot of the overlapping windows and their sum, using hopSize=blockSize/4 with the Hann window (squared, because it's applied twice - if it were not squared, blockSize/2 would work). as you see, they sum up to unity:
except for the fades at the ends (because there, less windows are contributing to the sum). when you use the same overlap factor with a blackman window, you can clearly see the amplitude modulation:
...but i think, for blackman, you can get this sum-to-unity property (at least approximately) with another overlap factor, too - i have too look that up or try it... edit: ah - here: https://ccrma.stanford.edu/~jos/sasp/COLA_Examples.html
Yeah I'll have to experiment with double precision. I wonder if maybe using 64-bit wav files will force JUCE to use higher precision... or at least I should try 32 bit just to see what happens.
This is basically the function of the sample tail extender but as you said, it doesn't use sinusoidal models? It only uses spectrogram stuff and bin amplitude? So it's not as good?
in some sense, it creates a (restricted version of a) sinusoidal model from the spectrogram data. for tail extension, i actually think, the general approach is quite appropriate - if it would be improved to estimating the decay rates from the data. but with the current one-decay-rate-for-all-harmonics implementation, the tail sounds rather static and artificial.
...i don't really get, what exactly you get and also don't get
Yep, using a 32-bit wav file solved the slight error.
OK, so how do you create a smooth spectral filter rather than brick wall? Do you simply reduce amplitude a little more for each frequency? Wouldn't you be able to hear the steppyness of the frequency rows? Especially if the filter frequency was changing over time?
how do you create a smooth spectral filter rather than brick wall? Do you simply reduce amplitude a little more for each frequency?
yes - i would probably linearly fade the amplitudes down over a certain number of bins. and/or maybe with a smoother (sin/cos) shaped function. we would just have to take care, that complementary low- and highpass add up to unity. ...unless you get your highpass by subtraction - then, it would be prefect reconstruction, regardless. however, the highpass may not be an exact mirror image of the lowpass, if you use just any fade-function
Wouldn't you be able to hear the steppyness of the frequency rows? Especially if the filter frequency was changing over time?
i guess, the overlap would smooth these steps out. what exactly do you mean to happen without time-modulation?
ahh - i guess, i see what you mean: the rounding of the cutoff-bin to integer values? yeah...i could probably allow float numbers for that by scaling the last bin by a number between 0 and 1. i must think about that
btw, I don't see me needing a sloped filter. But I guess interpolating values is something that will be needed a lot, especially for a denoiser.
One thing a lot of my clients ask for is a matching filter, to for example make a loud violin sample sound like a softer violin sample, usually to correct mistakes in the recording process, or even to generate new samples that are in between recorded dynamics. You do this by analyzing the sample you want to manipulate and get some kind of difference based on the sample you want it to sound like.
Do you know how matching stuff works?
Edit: This is a spectral process.
Do you know how matching stuff works?
hmm - do you have some example product to show me exactly what you mean? if it's about applying the spectral envelope of one signal to another signal, then i have done such things in the context of my master thesis (it was partially about spectral envelopes - and i implemented a vocoder based on this algorithm https://hal.archives-ouvertes.fr/hal-01161334/document)
Oh, well do YOU have some examples?
ok - yeah - this is fun. this is me reading and then vocoding the title of my thesis (it's in german - "representation and modification of spectral envelopes of natural sounds based on formant models"):
www.rs-met.com/sounds/effects/ThesisVocoderCarrier.wav www.rs-met.com/sounds/effects/ThesisVocoderModulator.wav www.rs-met.com/sounds/effects/ThesisVocoderOutput.wav
the speech of the output is really intelligible .....for germans
So you can use this to convolve a soft guitar pluck with a loud guitar pluck and create something in between? Obviously won't sounds as good as the real thing, but just enough to be usable in music composition.
well, it's vocoder - so the process is asymmetrical, so it's not like a morph (which i would expect to act somewhat symmetrical - and adjsutable) - carrier and modulator play different roles - but yeah, you get something "in between".
that said - morphing stuff could certainly be done as well. in fact, we'll have a lot of interesting stuff to explore with this spectrogram processor. the basic system is in place and (more or less) working - now the fun can begin
This is just an image comparing your voice envelope to the output envelope, not sure what purpose this image has. They look like they match up pretty well.
http://www.elanhickler.com/transfer/ThesisVocoderOutput_VocodexPlugin.wav
This is Vocodex example but there's some problems with it. It's great for a musical sound but your version sounds like a better starting point to improve on legibility and musicality.
it has a totally different character (thinner and a bit gnarly - i like the gnarl! a bit goa'esque). do you know how vocodex works? is it also spectrogram/stft based - or does it use the classical filter bank approach? edit: from the product desciption (https://www.image-line.com/plugins/Effects/Vocodex/):
Up to 100 bands individually locatable anywhere in the spectrum.
...so that probably means filterbank
Vocodex is the best vocoder for music. Here's more exaggerated example http://www.elanhickler.com/transfer/ThesisVocoderOutput_VocodexPlugin2.wav
So can you make a plugin with your vocoder?
So can you make a plugin with your vocoder?
the algorithm is implemented as a non-realtime matlab file. in principle, it could be turned into a realtime algorithm (in fact, in rosic, i already have some sort of framework for realtime spectral processors - that factors out all of the messy re-buffering, windowing, overlapping, yadda-yadda business). but: this "true-spectral-envelope" algorithm is expensive. for a single frame, it iterates multiple fft/ifft roundtrips until convergence (typically 5-10 iterations). not really good for realtime performance. however - i have some other ideas for simpler spectral envelope estimation algorithms - based on connecting peaks by lines or splines - which should probably give similar quality
Nevermind that! Explain how I can do some offline tests to see if it's at all viable for morphing two similar sound sources. I could make a function in SALT and use the scripting engine to play with it.
ok - i just added all the .m files that i wrote back then for my thesis to my research repo:
https://github.com/RobinSchmidt/RS-MET-Research/tree/master/Prototypes/Octave/Thesis
to run the vocoder, you need to install octave:
https://www.gnu.org/software/octave/
and run this script:
you should get exactly the output i posted above (the wavefile is dropped into the signals folder). put your input files there, too and modify the "audioread" calls appropriately. note that you must also give it the fundamental frequencies fo carrier and modulator (well - rough ballpark value is enough - it just scales the amount of envelope smoothing, if i remember correctly - mind you, i've not touched this code for 13 years!)
not sure what to do with this
what? why is there html code in it?! this is a matlab file!. first thing i'd recommend to do is to undock command-window from the center dock (i moved it to the right of the screen), so you can see the command window and editor window at once. then - why is your working folder D:/Desktop? isn't it supposed to be something:/RS-MET-Research/Prototypes/Octave/Thesis
my screen looks like this (after running the script - btw: warning: when finished, the script plays the resulting audio):
I used the vocoder to create some in-between samples of guitar dynamics. It seems to work, maybe a few improvements could possibly be made, seems worth pursuing.
interesting, non-standard use of a vocoder! :-O can you post some results? i'm curious to hear them
http://www.elanhickler.com/transfer/guitar_dynamic_morph_3_to_5.mp3
3 dynamics to 5 dynamics. The two in-between samples are the vocoder output plus I mixed in some of the original audio by hand and adjusted overall amplitude until it sounded right.
for some reason I am getting an infinte hang after deletion of the 2nd matrix in the code, I think it's the 2nd one if things are deleted in the order they appeared. The debugger isn't being helpful! Nothing seems out of the ordinary.
I'm trying to zero out harmonics (small ranges of frequencies)
for (int ch = 0; ch < channels; ++ch)
{
// compute the complex spectrogram:
WindowType W = WindowType::hanningZN;
Spectrogram sp;
sp.setBlockAndTrafoSize(blockSize, trafoSize);
sp.setHopSize(hopSize);
sp.setAnalysisWindowType(W);
sp.setSynthesisWindowType(W);
sp.setOutputDemodulation(true);
Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);
// workaround to create the deep copies
int numFrames = s.getNumRows();
int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()
Matrix sl(numFrames, numBins);
sl.copyDataFrom(s);
vector<int> binsToZero;
for (double cf = f; cf < sampleRate * 0.5;)
{
int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);
for (int b = lo; b < hi; ++b)
binsToZero.push_back(b);
cf += f;
}
// zero out harmonic bandwiths to get only noise
for (int i = 0; i < numFrames; i++)
for (int b : binsToZero)
s(i, b) = 0;
// subtract noise only from orignal to get harmonics only
vec x = sp.synthesize(sl);
auto ptr = harmAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] -= x[s];
// transfer noise only
ptr = noisAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] = x[s];
} // deletion occurs here!
I just commented out the deletion function haha. Look at my noise/harmonic spectrograms:
I'm using a frequency bandwith of 15hz. That is INSANELY PRECISE! Your bidirectional filters could never do this.
Ok, I need to do another test though. It might be better to capture a large portion of frequencies per harmonic rather than the smallest possible portion. Again your bidirectional filters would fail with this task because it's not as flat (so there would be filter overlap, causing issues)
Also, this function is insanely fast. 1 or 2 seconds to process a 10 second stereo clip.
Here are the noise only waveforms:
tiny portion per harmonic:
large portion per harmonic:
So, the large portion is bad because you can see that the noise has a regularity to it, you can spot some oscillations. That's bad because it's going to interfere with the resynthesized harmonics when combining together. Looks like it's best to capture as tiny a range of frequencies per harmonic as possible.
Edit: Hmmm, but capturing not just a large portion but EVERYTHING (so there is no noise left), and then resynthesizing it might actually work better because you then don't need to combine noise/harmonics at the end.
Edit: I think this is what I originally wanted to do from the beginning but never could due to not having flat enough filters.
with "portion-per-harmonic" you mean the bandwidth of the bandpass filters to isolate the harmonics?
the large portion is bad because you can see that the noise has a regularity to it, you can spot some oscillations
that makes sense. if the harmonic bandwidth gets wider, the "in-between" bandwidths of the noise bands get narrower. and narrow-bands have high sinusoidality/regularity
I think this is what I originally wanted to do from the beginning but never could due to not having flat enough filters.
hmmmm...actually, butterworth filters are quite nicely flat. ...but maybe not steep enough? ...in this case, one could go for elliptics at the cost of ripples (in passband and stopband). the FFT based filters have ripples, too.
That is INSANELY PRECISE! Your bidirectional filters could never do this.
so - this is good, right? i still cannot see, what fixed FFT filters can do that regular time-domain filters can't. maybe i should take the challenge of separating some signal with filters vs FFT. like a sort of battle experiment...haha
bidirectional filters can't change frequency right? But you could kinda move the filter around in a spectral process by changing bin volumes. Moving filters are needed.
yes - it's difficult to make time-varying bidirectional IIR filters while preserving their desirable zero-phase properties. with FIR filters it would be easier but computationally expensive, such that you'd probably end up with doing some FFT based process here also (FFT convolution). but at them moment, we are talking about fixed filters, right? i mean the stuff you did above where you said that bidirectional filters could not do it. for time varying stuff - especially tracking harmonics frequencies - i'd also opt for spectrogram based algorithms.
I'm having some issues with phaselocking, basically the same issues I had last time with bidirectional filters except I think things are sounding better and easier to use. I don't understand how SampleModelling gets perfectly phaselocked samples.
http://www.elanhickler.com/transfer/horn_samples_phaselocked.rar
Included is the original:
phaselocked example attempts to capture the phase per harmonic with
auto p = RAPT::rsSinePhaseAt<float>(x.data(), x.size(), x.size() * 0.5);
RAPT::rsRecreateSine<float>(x.data(), xNew.data(), x.size(), currentf, currentf, sampleRate, p, 0);
phaselocked_singlePhase example sounds better because I don't take phase measurements but set [p]hase to "0" for all harmonics, whatever that means, but then it loses a lot of stereo information.
I think RAPT::rsSinePhaseAt
is not working well enough... or maybe I need to make a measurement exactly at a spectral peak, one of those spikes.
ALRIGHT YES! Taking the phase measurement precisely at the most energetic spot spectrally has improved the result:
Now there's a weird issue with some amplitude modulation circled in red. No idea what that could be from.
Can you tell me how to, instead of zeroing out bins, manipulate the amplitude? First, retrieve the amplitude, and then change it to something else.
each matrix entry is a complex number, so to get the amplitude, you can just call std::abs
on it - the implementation for std::complex will extract the complex magnitude. you can also multiply the complex values by real numbers to change their magnitude. you may also want to look into std::arg
and std::polar
, if you want to deal with phase separately
i open a new thread to continue the discussion which started here:
https://github.com/RobinSchmidt/RS-MET/issues/280#issuecomment-525162273
but doesn't really belong to the main topic of this thread. ...so there's a class rsSpectrogram for converting an array of audio-samples into a spectrogram, represented as matrix of complex numbers and/or (re)synthesize an audio signal from such a spectrogram. in between analysis and resynthesis, one can apply arbitrary transformations to the spectrogram data. one of the simplemost things to do is filtering by zeroing out the bins above or below a certain cutoff point. the function
spectrogramFilter
in the TestsRosicAndRapt project demonstrates how this can be done....by writing this test function, i discovered a flaw in the underlying matrix class which unfortunately requires some inconvenient additional copying workaround (there are comments about this in the code). i think, i will soon replace this matrix class - which i wanted to do since some time anyway. i have now other ideas, how a proper implementation of a matrix class should look like (probably next week - i'm a bit sick at the moment)