Improve FFT view by preserving all frequencies by averaging based on number of bars displayed

AudioKit / AudioKitUI

Controls and Visualization for AudioKit apps

MIT License

187 stars 52 forks source link

Improve FFT view by preserving all frequencies by averaging based on number of bars displayed #25

Closed aburgel closed 2 years ago

aburgel commented 2 years ago

The current FFTView code throws away data that doesn't fit into the number of bars displayed in the UI. This means that higher frequencies won't be displayed.

This fixes FFTView by preserving data from those higher frequencies, and aggregates the frequencies into the number of bars to be displayed by averaging them. I have no idea if averaging is the best way to do this aggregation.

I also pulled out some of the math from the loop into functions provided by Accelerate. I didn't benchmark, so perhaps this is slower, but it looks fancier 😀 .

You can see the difference in how higher frequencies display in these screenshots. The first is the original code... you'll notice that the FFTView shows nothing. The second is running this branch. (Btw, I had to tweak the Oscillator recipe to allow a higher max frequency.)

Before

After

aburgel commented 2 years ago

maybe we could move some of the DSP code back into FFTTap if possible?

That's an interesting idea. Perhaps it is useful to provide the data as decibels, but I don't have much sense of how FFTTap is used outside of this view. What do you think?

We could also just merge this and refactor FFTTap later. Is there a roadmap or list of feature requests anywhere for FFTTap?

aburgel commented 2 years ago

Btw, I don't have permission to merge, so feel free to do that whenever you think this is ready.

Matt54 commented 2 years ago

Hey guys, I've got a few thoughts on this, so let's hold off on merging until I have a second today to get into the details.

Matt54 commented 2 years ago

Btw thanks for your efforts here @aburgel! I had been thinking along the same lines with averaging FFT bins, but was also unsure if that is appropriate. If I recall correctly, there are other ways to manipulate the FFT calculation to reduce the number of bins returned. I'll need a free window of time later on today to dig back into this and check your changes.

emurray2 commented 2 years ago

maybe we could move some of the DSP code back into FFTTap if possible?

That's an interesting idea. Perhaps it is useful to provide the data as decibels, but I don't have much sense of how FFTTap is used outside of this view. What do you think?

We could also just merge this and refactor FFTTap later. Is there a roadmap or list of feature requests anywhere for FFTTap?

Yeah that's what I was thinking. Once Matt takes a look at the changes, it will probably be fine. I just wanted to bring it up as a discussion point. Right now, I believe there aren't many feature requests specifically for FFTTap, but if you have any in mind, please feel free to continue improving it.

If the code with FFTView and FFTTap gets too long, then it might be better to make a spectrogram class specifically meant to handle dB data.

Matt54 commented 2 years ago

Sorry for the delay on this review, it's taken me a bit of time to unpack this. I'm going to address this PR as well a previous one that was merged ( https://github.com/AudioKit/AudioKitUI/pull/23 ) together and I propose that we fix them both up simultaneously before merging this PR.

(FYI - I use amplitude to describe the outputted visualization bar value, although I'm not sure if that is precise language on my part. Perhaps, I should be saying decibels...)

For your reference, this is the audio track I am using: https://www.dropbox.com/s/barh6mbtlprajyz/LittleThings.mp3?dl=0

I think an agreeable goal here is to reproduce the values being outputted by Ableton. When in doubt, what would Ableton do??

Ableton's Spectrum:

https://user-images.githubusercontent.com/15333030/130378961-28158034-dae9-4368-bf14-ecba94b44997.mp4

My WIP Spectrum View (using my calculation below for amplitude):

https://user-images.githubusercontent.com/15333030/130378978-813c42ae-4f29-4ff2-bdef-923f21270083.mp4

The amplitude values are approximately the same between these two visualizations. I was comparing these while trying to figure out if I had the correct formula for amplitude and that is where I landed on this:

Inside FFTTap:

// Scale appropriate to the algorithm - results in strictly negative amplitude values (tested against Ableton Live's Spectrum Analyzer) 
var scaleMultiplier = [Float(1.0 / Double(frameCount))]

Inside FFTView:

// loop by two through all the fft data
for i in stride(from: 0, to: FFT_SIZE - 1, by: 2) {
  if i / 2 < amplitudes.count {
    // get the real and imaginary parts of the complex number
    let real = fftData[i]
    let imaginary = fftData[i + 1]
    let normalizedBinMagnitude = 2.0 * sqrt(real * real + imaginary * imaginary) / Float(FFT_SIZE)
    let amplitude = Double(20.0 * log10(normalizedBinMagnitude))
  }
}

While, I can't give a complete explanation for why this works, it seems obvious to me from the outputted visualization that the amplitude value ends up being correct.

Using this same calculation for the FFTView (also, without bin averaging) results in this visualization:

https://user-images.githubusercontent.com/15333030/130379000-053e8c45-a94b-44fc-9243-5d9842e846d6.mp4

This is how the FFTView calculated it's bin amplitudes up until this PR: https://github.com/AudioKit/AudioKitUI/pull/23

After that change was made, the visualization looked like this:

https://user-images.githubusercontent.com/15333030/130379023-888ccefb-7f8e-4144-b605-5637c3c4732d.mp4

Given that the range is supposed to be -140dB to -10dB, we know that at this point our amplitude calculation must be wrong. This is the current state of FFTView in develop.

Now with the latest PR, we've changed a lot. We removed the stride, swapped out the amplitude calculation, and then added a bin-averaging-to-bar. It now looks like this:

https://user-images.githubusercontent.com/15333030/130379036-6c297031-f862-4205-9e70-23d971ee8e0b.mp4

Now, I'm not sure if we have fixed our amplitude calculation issue due to the averaging, so we can set the above aside for a second (though let's fix this in this PR!)...

My push back against this PR is that we should not change the default behavior to averaging the bins to determine the bar value. If we want to add that as a variation I'm totally on board. However, it seems valid and useful to me to visualize a portion of the returned fft bins unadulterated. It also looks a lot cooler when visualizing an audio track in my opinion.

So, I think what we should do to wrap this up is determine how to arrive at the correct amplitude values and alter the updateAmplitudes method to only average the bins if a boolean flag is set to true for the FFTView.

Matt54 commented 2 years ago

One more thing, after sleeping on this I think I have a new insight on the bins -> bar combination algorithm. Long story short, I don't think we should average the bins linearly with the same number of bins per bar as we are doing here.

Imagine we wanted to split the frequency spectrum into 4 sections (low, low-mid, mid, highs or whatever you want to call them). You would do this by partitioning the audible frequency spectrum at something like 100Hz, 1000Hz, and 10,000Hz. We need to respect this logarithmic trend here when we combine our bins.

I think I could come up with a relatively simple algorithm that determines where the frequency breakpoints would be to arrive us at the same number of partitions as the number of bars that we want. From that, you convert those frequencies to array indexes and split up your fft data array. Finally, either run an average in each split or just simply take the maximum bin value.

You see, it doesn't actually matter how many bins are in each bar. What matters is that you split up the frequency spectrum logarithmically. In my opinion, this is the appropriate way to do this. Or perhaps you could additionally have a linear algorithm, but I'm not sure if that is actually useful. I have never viewed the frequency spectrum linearly.

emurray2 commented 2 years ago

One more thing, after sleeping on this I think I have a new insight on the bins -> bar combination algorithm. Long story short, I don't think we should average the bins linearly with the same number of bins per bar as we are doing here.

Imagine we wanted to split the frequency spectrum into 4 sections (low, low-mid, mid, highs or whatever you want to call them). You would do this by partitioning the audible frequency spectrum at something like 100Hz, 1000Hz, and 10,000Hz. We need to respect this logarithmic trend here when we combine our bins.

I think I could come up with a relatively simple algorithm that determines where the frequency breakpoints would be to arrive us at the same number of partitions as the number of bars that we want. From that, you convert those frequencies to array indexes and split up your fft data array. Finally, either run an average in each split or just simply take the maximum bin value.

You see, it doesn't actually matter how many bins are in each bar. What matters is that you split up the frequency spectrum logarithmically. In my opinion, this is the appropriate way to do this. Or perhaps you could additionally have a linear algorithm, but I'm not sure if that is actually useful. I have never viewed the frequency spectrum linearly.

Yeah that would make sense. I think Ableton's goes by power's of 10 too if I recall correctly.

Also, I saw your previous comment, and I'm still working on wrapping my head around the Fourier theory surrounding this issue with the amplitude. The old version does seem to work, but I'm still attempting to understand why.

I found an interesting Stack Overflow post I've been reading which makes the operations inside FFTTap clearer to me.

https://stackoverflow.com/questions/3398753/using-the-apple-fft-and-accelerate-framework

emurray2 commented 2 years ago

@aburgel @Matt54 Question that sparked my curiosity into the amplitude values: Why do some examples of vDSP_fft_zrip scale the FFT data right after it's returned from the FFT rather than scaling the magnitude data instead (as FFTTap currently does)?

FFTTap lines 78-96:

var magnitudes = [Float](repeating: 0.0, count: inputCount)
vDSP_zvmags(&output, 1, &magnitudes, 1, vDSP_Length(inputCount))

var scaledMagnitudes = [Float](repeating: 0.0, count: inputCount)

// Scale appropriate to the algorithm - results in strictly negative amplitude values (tested against Ableton Live's Spectrum Analyzer)
var scaleMultiplier = [Float(1.0 / Double(frameCount))]

if isNormalized {
    // Normalising
    scaleMultiplier = [1.0 / (magnitudes.max() ?? 1.0)]
}

vDSP_vsmul(&magnitudes,
           1,
           &scaleMultiplier,
           &scaledMagnitudes,
           1,
           vDSP_Length(inputCount))

Example code from myuiviews.com:

static void performFFT(float* data, UInt32 numberOfFrames, SoundBufferPtr soundBuffer, UInt32 inBusNumber) {

    int bufferLog2 = round(log2(numberOfFrames));
    float fftNormFactor = 1.0/( 2 * numberOfFrames);

    FFTSetup fftSetup = vDSP_create_fftsetup(bufferLog2, kFFTRadix2);

    int numberOfFramesOver2 = numberOfFrames / 2;
    float outReal[numberOfFramesOver2];
    float outImaginary[numberOfFramesOver2];

    COMPLEX_SPLIT output = { .realp = outReal, .imagp = outImaginary };

    //Put all of the even numbered elements into outReal and odd numbered into outImaginary
    vDSP_ctoz((COMPLEX *)data, 2, &output, 1, numberOfFramesOver2);

    //Perform the FFT via Accelerate
    //Use FFT forward for standard PCM audio
    vDSP_fft_zrip(fftSetup, &output, 1, bufferLog2, FFT_FORWARD);

    //Scale the FFT data
    vDSP_vsmul(output.realp, 1, &fftNormFactor, output.realp, 1, numberOfFramesOver2);
    vDSP_vsmul(output.imagp, 1, &fftNormFactor, output.imagp, 1, numberOfFramesOver2);

    //Take the absolute value of the output to get in range of 0 to 1
    vDSP_zvabs(&output, 1, soundBuffer[inBusNumber].frequencyData, 1, numberOfFramesOver2);

    vDSP_destroy_fftsetup(fftSetup);
}

I tried scaling the data before the magnitudes and used @aburgel 's code for FFTView:

And here's how it looks in linear mode in Ableton's spectrum with more bins:

Not bad right? Although they aren't exact, we can still see the low end being high, mid flat, and high taper off in both pictures. The key is that FFTView is still being averaged linearly as @Matt54 said. Log mode does make quite the difference:

Let me know what you guys think.

aburgel commented 2 years ago

@Matt54 and @emurray2 thanks for all the feedback and for digging into the code with me. this is tricky stuff!

i agree that having something like ableton to compare to is useful, though i'd love if there was some more rigorous way to confirm that we're getting this right. (I haven't found it yet)

i've been over the math a few times, though i am by far not an expert in this stuff (i started using AudioKit to learn a bit more about FFT with audio data), I can't see where my approach is going wrong, and to be honest, i don't understand how the original approach worked. the data coming from FFTTap is definitely not complex numbers, so doing things like 2.0 * sqrt(real * real + imag * imag) just doesn't make sense.

anyways, i like the idea of log scale, and since that's how ableton work, so i took a pass at implementing that, and i also tweaked some of the configuration, namely, i turned on FFTTap.isNormalized and i changed how the conversion to decibels works. You can see the code in the last two commits. I also created a video that I think gets closest to ableton (the sound quality sucks cuz i could only get it to record through the microphone).

Let me know what you think.

https://user-images.githubusercontent.com/341478/130553823-aa25cd8a-6b88-4d84-8e3c-faab028b0a2f.mov

Matt54 commented 2 years ago

@aburgel That does look cool, I love the log spaced bars. I had been planning to add that functionality to my spectrum view when I get a chance. Unfortunately though, isNormalized=true essentially bypasses our decibel calculation issue.

I discussed this topic with Matt Howell (can't find his tag..) and we worked through how you could just manipulate the number of bins returned by changing the log2n argument to the fft. You can read more about it here: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/vDSP_Programming_Guide/UsingFourierTransforms/UsingFourierTransforms.html

What I propose: We limit the FFTView to only 2^(n) number of bars - perhaps some enumeration with only valid inputs We change fftTap to take an input argument to return to us the number of bins we asked for (This will decouple the number of bins returned from the frame count)

With this change, we won't need to average bins at all. We simply ask for a valid number of bins and get our result back.

Matt54 commented 2 years ago

Some notes on performFFT:

Matt54 commented 2 years ago

static func performFFT(buffer: AVAudioPCMBuffer, isNormalized: Bool = true, zeroPaddingFactor: UInt32 = 0, fftSetupForBins: FFTSetupForBins) -> [Float] {
        let frameCount = buffer.frameLength + buffer.frameLength * zeroPaddingFactor
        let numberOfBinsTimes2 = UInt(fftSetupForBins.value+1) // number of bins * 2
        let log2n = numberOfBinsTimes2 //UInt(round(log2(Double(frameCount))))
        let bufferSizePOT = Int(1 << log2n) // bitwise left shift operator: 1 << n = 2^n
        let inputCount = bufferSizePOT / 2 // this number corresponds to the number of bins returned
        let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2)) // maybe we could optimize by only setting up when log2n changes?

where fftSetupForBins.value is just the value used for log2n set by an enumerator that corresponds to only integer 2^n values

nothing is altered in performFFT expect manually setting log2n instead of log2(frameCount) - just comments

aburgel commented 2 years ago

I discussed this topic with Matt Howell (can't find his tag..) and we worked through how you could just manipulate the number of bins returned by changing the log2n argument to the fft.

That's an interesting idea. For some reason I thought that log2n number had to be greater than the number of input elements, but perhaps it doesn't, since it seems that we never actually pass in the number of input elements at all. I can give this a try.

Unfortunately though, isNormalized=true essentially bypasses our decibel calculation issue.

Gotta admit I'm a little lost on this decibel conversion. I went with a reference point of 1 because that avoided the question of how to pick a value. I also came across some code from apple that uses one. It also does some different kind of normalization of the FFT output.

https://developer.apple.com/library/archive/samplecode/aurioTouch/Listings/Classes_FFTHelper_cpp.html#//apple_ref/doc/uid/DTS40007770-Classes_FFTHelper_cpp-DontLinkElementID_11

This more closely matches what I see in reference documentation, namely normalizing by 1/2*number of frames and then getting the magnitude and converting that to zero, though apple's magnitude function is a sum of squares, not the square root of the sum of squares which is more commonly what i see in other documentation.

There's also some magic number thing related to decibels that i don't quite follow. Do you?

I'm gonna play around a little more. I think we're getting close to something that can withstand scrutiny.

emurray2 commented 2 years ago

I discussed this topic with Matt Howell (can't find his tag..) and we worked through how you could just manipulate the number of bins returned by changing the log2n argument to the fft.

That's an interesting idea. For some reason I thought that log2n number had to be greater than the number of input elements, but perhaps it doesn't, since it seems that we never actually pass in the number of input elements at all. I can give this a try.

Unfortunately though, isNormalized=true essentially bypasses our decibel calculation issue.

Gotta admit I'm a little lost on this decibel conversion. I went with a reference point of 1 because that avoided the question of how to pick a value. I also came across some code from apple that uses one. It also does some different kind of normalization of the FFT output.

https://developer.apple.com/library/archive/samplecode/aurioTouch/Listings/Classes_FFTHelper_cpp.html#//apple_ref/doc/uid/DTS40007770-Classes_FFTHelper_cpp-DontLinkElementID_11

This more closely matches what I see in reference documentation, namely normalizing by 1/2*number of frames and then getting the magnitude and converting that to zero, though apple's magnitude function is a sum of squares, not the square root of the sum of squares which is more commonly what i see in other documentation.

There's also some magic number thing related to decibels that i don't quite follow. Do you?

I'm gonna play around a little more. I think we're getting close to something that can withstand scrutiny.

If I'm not mistaken, it has to do with Parseval's Theorem. That's why the normalizing happens for log2n frames (the number of N's given by the FFT). Since the zvmags is the squared magnitudes, we could do 10log10 or take the sqrt of them and do 20log10. However, as we've learned, these are the sum of the real**2 and imag**2.

One thing I've also noticed with many examples is when they normalize the values according to Parseval's theorem. The values seem to be normalized right after the FFT before zvmags. In AudioKit, the values are normalized after zvmags.

I don't have any proof yet this would have an effect, but it might be worth looking at.

Matt54 commented 2 years ago

Check these out: https://github.com/AudioKit/AudioKit/pull/2581 https://github.com/AudioKit/AudioKitUI/pull/29

FFTTap changes: Added ability to set the number of bins returned by an FFTTap Moved the scaling to before vDSP_zvzsml as you suggested @emurray2 Changed scaling to be based on the bin count (My understanding of Parseval's theorem)

FFTView changes Added ability to set the number of bins used for an FFTView Changed the calculation for the decibels according to your changes @aburgel

Matt54 commented 2 years ago

My understanding is that the scaling of the fft output is set according to a reference value. I used twelve to push down the returned decibel values in FFTView. I think my previous understanding of decibels was flawed, so perhaps I have been muddying the water with my comparison to the specific values returned by Ableton's spectrum. Let me know what we think..

aburgel commented 2 years ago

closing this since @Matt54 fixed a lot of these issues elsewhere. i'll pull out the log scaled view into another PR.