Surge.fft() and numpy.fft.fft() seem to produce different results

SreehariRamMohan commented 5 years ago

I am attempting to plot a periodogram in Swift of a signal using Surge.

Code:

var fft_mat = Surge.pow(Surge.fft(signal), 2)
var const_mult = 2.0/Double(signal.count)
for var i in 0..<fft_mat.count { //multiply fft_mat by const_mult
    fft_mat[i] = Float(const_mult) * fft_mat[i]
}
var pgram = fft_mat

Plotting pgram yields the following results

ios-periodogram

However, after loading and creating the exact same periodogram in Python I get a very different periodogram.

Code: pgram = (2.0/len(signal)) * numpy.power(numpy.fft.fft(signal), 2)

pure-numpy-periodogram

Since I am using the exact same method to plot the periodogram (and the same data as well), I was wondering if there are some differences in the implementation of the Surge fft and numpy fft which might cause this issue?

alejandro-isaza commented 5 years ago

Looks like they are using different scales. How are you computing the x-axis? Maybe try removing pow. Can you share the contents of signal?

SreehariRamMohan commented 5 years ago

Yeah. Here is the dummy data used to create the above signals/graphs above.

Column 1, "RightLegAngleSignal", is what I used to construct the periodogram in Swift as well as numpy/python.

The last column, "pgram", is the periodogram which Surge outputted (First graph above).

If you run the python code on Column 1, "RightLegAngleSignal", you will get the second graph.

Thanks so much

all-signals.csv.zip

alejandro-isaza commented 5 years ago

I see there are a ton of nans, did you try filtering those out?

SreehariRamMohan commented 5 years ago

Yes. In both Swift and Python, I filtered them out.

alejandro-isaza commented 5 years ago

I get a different result, are you on the latest version?

SreehariRamMohan commented 5 years ago

I'm on Surge 2.0.0. Are both your Python and Swift periodograms matching?

alejandro-isaza commented 5 years ago

Try 2.2.0

SreehariRamMohan commented 5 years ago

Thanks I'll try that right now

SreehariRamMohan commented 5 years ago

Ok I just ran the app again (using Surge 2.2.0) and generated a new CSV, Swift Periodogram, and Numpy Periodogram.

Data attached below:

Numpy Periodogram pure-numpy-periodogram

Swift Periodogram ios-periodogram

They are still different (but have some similarities in shape). Do you think this has something to do with the symmetry of the FFT around the center point? Also it also looks like the scales are different, do you know why that is?

Thanks so much all-signals.csv.zip

gburlet commented 5 years ago

DSP guy here. I've managed to match Surge's FFT and numpy's FFT. They are not equivalent as it stands. Here are the differences:

Surge's FFT actually calculates the power spectrum with vDSP_zvmags and then reverses that operation later on by taking the square root and scaling. If you remove the subsequent line of code with vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] then you get the power spectrum.
Numpy has no normalization. In other libraries, sometimes I have seen a scaling coefficient on the magnitudes, sometimes not. Surge's line vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)] scales the magnitudes by 2/N where N is the window size.
Surge's FFT doesn't handle non powers of 2 for window size. It's probable that the person commenting above didn't choose a power of 2 and that's why the frequency response graph looks of similar shape but scaled and translated.
Surge's FFT doesn't have a windowing function at all. Common choices are hamming or hann windows. Very rarely are FFTs done with a boxcar (rectangular) window.
Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

I rewrote the Surge FFT code to window with a hamming window and return the power spectrum (no scaling) to match the librosa one-liner: S = np.abs(librosa.core.stft(x, n_fft=w, hop_length=w, window=signal.hamming, center=False)) ** 2. Under the hood, librosa uses Numpy's FFT.

I tested on 1 frame of the same audio wav file within Python and on iOS using surge and got basically the same results (except for some floating point precision differences).

public func powerSpectrum(_ input: [Float]) -> [Float] {
        // apply hamming window
        // window has slightly different precision than scipy.window.hamming (float vs double)
        var win = [Float](repeating: 0, count: input.count)
        vDSP_hamm_window(&win, vDSP_Length(input.count), 0)
        let input_windowed = mul(input, win)

        var real = [Float](input_windowed)
        var imaginary = [Float](repeating: 0.0, count: input_windowed.count)
        var splitComplex = DSPSplitComplex(realp: &real, imagp: &imaginary)

        let length = vDSP_Length(floor(log2(Float(input_windowed.count))))
        let radix = FFTRadix(kFFTRadix2)
        let weights = vDSP_create_fftsetup(length, radix)
        withUnsafeMutablePointer(to: &splitComplex) { splitComplex in
            vDSP_fft_zip(weights!, splitComplex, 1, length, FFTDirection(FFT_FORWARD))
        }

        // zvmags yields power spectrum: |S|^2
        var magnitudes = [Float](repeating: 0.0, count: input_windowed.count)
        withUnsafePointer(to: &splitComplex) { splitComplex in
            magnitudes.withUnsafeMutableBufferPointer { magnitudes in
                vDSP_zvmags(splitComplex, 1, magnitudes.baseAddress!, 1, vDSP_Length(input_windowed.count))
            }
        }

        vDSP_destroy_fftsetup(weights)

        return magnitudes
    }

Not sure what the actionable items are for the Surge maintainer(s) to patch up the FFT function. It's not necessarily wrong, I think it just needs some more parameters so people can match output from other libraries. Possibly adding a window parameter to the Surge FFT function as well as an option to specify whether to scale the result. In my opinion letting people scale the result outside of this function is probably best.

tomekid commented 4 years ago

Hey thanks for writing that. What does the mul(input,win) do? does it multiply the two arrays? hows does that work?

gburlet commented 4 years ago

Element-wise multiplication

regexident commented 4 years ago

I'm personally not familiar enough with the FFT, so …

@gburlet would you consider your modified implementation a bug fix or an alternative to our current one? Or put differently: would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

Either way I'd be more than happy to merge a PR addressing the issue appropriately. 🙂

gburlet commented 4 years ago

would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user?

I would say the Surge FFT algorithm returns a scaled amplitude spectrum (something like np.abs(S) where you take the absolute value of the complex components in the FFT) so I wouldn't really call it an FFT, which typically returns the complex components e.g., scipy rfft. I would say your standard Surge user doesn't really care because most people in audio end up wanting the amplitude spectrum anyways so they don't have to deal with complex numbers.

My personal recommendation would be to try to match output of heavy-use libraries like numpy, scipy for compatibility between experimental systems (done off device, e.g., machine learning experimentation) and production systems (e.g., prediction on device).

Specific Recommendations:

I would remove the scaling of 2/N. People scale FFTs in different ways and nobody likes looking through source code to see how something is scaled to reverse the operation and add their own scaling. Removing scaling would also match NumPy. Keep the sqrt though, so you don't have the power spectrum.
Allowing the user to specify a window function (hamming, hann, etc.) is critical and this is something that Surge lacks. Nobody runs FFTs with a rectangular window due to bad spectral leakage. In fact, by default a lot of other FFT libraries use a hamming window by default and therefore default output in Surge can be quite different.
Enforce window size power of 2 or snap to nearest power of 2: all FFT libraries I've encountered require the input window size size or number of points in the FFT to be a power of 2. DFTs usually allow non powers of 2, but with FFTs this is usually a requirement due to the butterfly algorithm (divide and conquer strategy). This is the issue the OP ran into without knowing why it was happening.

regexident commented 4 years ago

This is great, thanks a lot @gburlet!

brett-eugenelabs commented 3 years ago

@gburlet Do you have the source for the mul() in the following line? let input_windowed = mul(input, win)

gburlet commented 3 years ago

Hi @brett-eugenelabs, it's element-wise multiplication: same function as vDSP_vmul in Accelerate framework or Surge.elmul in Surge. Hope that helps 👍

abokhalel2 commented 3 years ago

@gburlet is there a possibility in your example to make hop_length and window_size different numbers? I'm trying to find a way to make them match the results of torch.stft or librosa.core.stft when hop_length and window_size being different numbers.

This is my torch.stft example in case needed.

import torch
import torchaudio

win_length = 4096
hop_length = 1024
win = torch.hann_window(win_length)
wav, sr = torchaudio.load('audio_example.wav')
stft = torch.stft(wav, win_length, hop_length=hop_length, window=win)

gburlet commented 3 years ago

Hi @abokhalel2, my example is just for the FFT on a single window, not an STFT. To get the STFT you would sliding window the samples and send it to the function above. Typically both the window and hop size are powers of 2 with the hop being some division of the window size. Hope that helps.

vade commented 1 year ago

I have a perhaps naive question for @gburlet - (im not a DSP guy at all so forgive me if this is a newbie q) - why do you fill only the real components of of the DSP Split Complex with the source audio frame?

Looking at Apples Mel sample code on the vDSP page, they bind the incoming time domain audio signal via vDSP_ctoz into a split complex, filling both the real and imaginary components.

With their implementation, for a FFT size of 400, you'd get 200 complex samples back, or 200 |S|^2 magnitude values.

With your implementation, you'd get 400 magnitude values back? Am I understanding that correctly?

And if so, what are the differences in the FFT output when populating the Split Complex with only real components, vs running vDSP_ctoz?

Thank you in advance. This thread has been INCREDIBLY helpful in trying to match Torches STFT code to produce a Log Mel Spectrogram. Im not there yet, but im getting closest thanks to this thread.

gburlet commented 1 year ago

@vade I unfortunately don't have time / brain bandwidth to jump into this in more detail but see above, this point specifically, which may help you understand:

Surge's FFT returns the full magnitude array which is unnecessary for real valued signals since it's mirrored about the nyquist rate.

vade commented 1 year ago

Hey @gburlet - no worries, Your input here has been super helpful. I appreciate the prompt reply! thank you!

gburlet commented 1 year ago

@vade I see you're contributing to an OpenAI whisper CoreML project. I like that.

Find my email online and I'll share my Accelerate vDSP Mel Spectrogram code written in swift privately. It took a while to make it match librosa output. Not sure if it will match Torches STFT code but might give you another reference point. Also, from what I remember, it's really fast.

vade commented 1 year ago

Much obliged! For real!

annoybot commented 1 year ago

@gburlet I'm trying to use your powerSpectrum() code above and I'm getting the error:

No exact matches in call to global function 'mul'

At the line:

let input_windowed = mul(input, win)

Am I missing something? What kind of multiplication is this element wise, or ...? Sorry if this question seems obtuse.

gburlet commented 1 year ago

@annoybot see above, it has been answered

annoybot commented 1 year ago

Ah yes, indeed it has been answered, thanks. 😬

annoybot commented 1 year ago

@gburlet, I hope the following question is not too off topic for this thread.

Do you think there would be an advantage in using Welch's method over a plain FFT when processing sound data from an instrument?

I was wondering if it would help eliminate noise?

Jounce / Surge

Surge.fft() and numpy.fft.fft() seem to produce different results #94