dmrschmidt / DSWaveformImage

Generate waveform images from audio files on iOS, macOS & visionOS in Swift. Native SwiftUI & UIKit views.
MIT License
978 stars 109 forks source link

WaveformLiveCanvas with live floats from AVAudioPCMBuffer.floatChannelData always indicating max volume, not showing silence #90

Closed lucashenning closed 5 months ago

lucashenning commented 5 months ago

Hi @dmrschmidt, first off, thanks for your hard work on this library, your work is very much appreciated!

I've been running into some issues when using your library with live microphone input data generated by AVAudioPCMBuffer.floatChannelData. I'm using this logic to install a tap and generate the float array. In addition to the plain tap, I'm running the values through a clean up function and low pass filter based on this SO answer.

  inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
      Task { @MainActor in
          let firstChannel = buffer.floatChannelData![0]
          let micLevel = Array(UnsafeBufferPointer(start: firstChannel, count: Int(buffer.frameLength)))
          let volume = self.waveFormProcessor.getVolume(from: buffer, bufferSize: 1024) // clean up function with low pass filter
          self.micLevelInputTap.append(volume)
      }

      request.append(buffer)
  }

The resulting float array looks good to me, with 0.0 indicating silence, and all values > 0 indicating mic input.

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.015040246, 0.0, 0.0, 0.0, 0.008562827, 0.47987133, 0.22618341, 0.46834388, 0.26414242, 0.061394483, 0.020363495, 0.0, 0.014935541, 0.34133458, 0.08352205, 0.029034426, 0.24504444, 0.30886558, 0.17852035, 0.23118775, 0.057510324, 0.031340875, 0.007121415, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

However, when passing this float array into WaveformLiveCanvas I get the following result: image

Here's a fully working example, including preview and values:


import SwiftUI
import DSWaveformImage
import DSWaveformImageViews

struct WaveFormView: View {
    var samples: [Float]

    var body: some View {
        WaveformLiveCanvas(
            samples: samples,
            shouldDrawSilencePadding: true
        )
        .frame(height: 70)
    }
}

#Preview {
    var samples: [Float] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.015040246, 0.0, 0.0, 0.0, 0.008562827, 0.47987133, 0.22618341, 0.46834388, 0.26414242, 0.061394483, 0.020363495, 0.0, 0.014935541, 0.34133458, 0.08352205, 0.029034426, 0.24504444, 0.30886558, 0.17852035, 0.23118775, 0.057510324, 0.031340875, 0.007121415, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.01308909, 0.22376725, 0.2560793, 0.21170278, 0.22710785, 0.14060958, 0.20179021, 0.23676065, 0.19946113, 0.26202688, 0.19228598, 0.2498725, 0.052141245, 0.01143893, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.015377111]
    return WaveFormView(samples: samples)
}

The preview looks like this: image

Questions

  1. Why is it not showing periods of silence correctly and always showing max volume instead?
  2. Is there an issue with my [Float] values? They seem to indicate mic input correctly.
  3. How can I configure WaveformLiveCanvas to interpret the values of mic input correctly, i.e. showing silence and different levels of volume as indicated by the array floats?
dmrschmidt commented 5 months ago

Hey @lucashenning,

admittedly counter-intuitively, your expectation for what 0 means and what 1 means is inverted. So 0 is expected for maximum loudness, 1 for silence.

I think I did this originally to align it with dB LUFS, where 0 is the maximum.

dmrschmidt commented 5 months ago

And by the way thank you for your very well crafted question, appreciate that.

I'm also happy to discuss this further, of course.

lucashenning commented 5 months ago

Thanks for the prompt reply @dmrschmidt! Facepalm moment for me, but glad it's working now.

One minor follow-up question, is there a way to increase the wave speed? I think it would be good for the wave to move a little faster to provide real time feedback to the user. I know the full width is 1200 values, is there any way to change this?

This is captured at real time:

https://github.com/dmrschmidt/DSWaveformImage/assets/12414494/c872b673-c75a-4253-a190-c93ef4c164bb

In comparison, this is Apple's native Voice Recorder app (as you can see, the wave is much faster and seems to be more sensitive, this is the result I was hoping for):

https://github.com/dmrschmidt/DSWaveformImage/assets/12414494/d6d2a628-9996-48ed-8d82-7e3232bf74c1

dmrschmidt commented 5 months ago

Yeah well, it really is counter intuitive I’m realizing. I still kind of feel that it’s reasonable to loosely align it with typical dB values in audio processing, but it should at the very least be clearly documented to avoid confusion. Or maybe even have different setters for dB values and amplitudes. I’ll think about that.

Regarding the speed, that’s really essentially just a function of the „sampling frequency“. In your tap you’re using a buffersize of 1024, which is relatively large I would say for this use case. So if you half or quarter that, you’ll get a more speedy result.

dmrschmidt commented 5 months ago

I just did a little experiment myself btw with your approach to read the raw volume manually. If you check my implementation of RecordingIndicatorView you'll note that I am using averagePowerForChannel: instead, which I hacked to add the volume same sample 3 times to speed up the view.

I was hoping that with your approach, this hack wouldn't be necessary anymore. However what I found is that the bufferSize is really just a hint. Setting it to 1024 or 128 on my test hardware didn't yield a difference, I always get 4800 samples back. Read it up and its due to hardware optimization and so might differ from model to model. So then I split up the buffer manually, calculated the volume on the sub buffers and added the values, but the result looks jittery because now we're adding up to 40 samples at once.

So you'll likely need to do some extra smoothing via whatever mechanism you choose. You probably don't need scientific precision here, so a hacky approach like I did as described above is probably going to be totally sufficient for you, too.

lucashenning commented 5 months ago

Thank you @dmrschmidt - I really appreciate you going the extra mile with your support.

I wasn't able to get averagePowerForChannel for my AVAudioPCMBuffer but I added some interpolated values to the [Float] values and it's working fine for me now.

Thanks again for your help. Feel free to close this issue.

dmrschmidt commented 5 months ago

My pleasure. Happy to hear you’ve found a workable solution.