LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
https://pyroomacoustics.readthedocs.io
MIT License
1.35k stars 419 forks source link

Zero-padding #41

Closed maelp closed 5 years ago

maelp commented 5 years ago

Are you actually using zero-padding when computing the beamformer result? I saw that the default are set to 0, won't this lead to aliasing when using circular convolutions corresponding to the filter frequency-domain multiplication?

fakufaku commented 5 years ago

Hi @maelp , indeed, you will need to set the proper amount of zero-padding when doing frequency domain filtering. It might indeed be a good idea to set reasonable default values there. Also, we have updated the STFT implementation (the new one is in realtime sub-package) and should probably fix the beamforming class to use this one.

Another thing you need to be careful is that your filter also needs to be zero padded when doing FD convolution. This might not be completely obvious when the beamforming weights are computed directly in the frequency domain.

maelp commented 5 years ago

Hi @fakufaku thanks for this answer! So if I understand correctly, the "correct way" to do FD filtering would be this:

this seems a bit computation-hungry though, and I never saw this done in any codebase I saw, so this is why I was wondering if this was the "correct way" to do it, or if there are reasons why someone can do the beamforming without zero-padding?

I was assuming that the justification would be that if we do hops of size L << N and we recover only the L central coefficients, and that the data has been windowed, then the influence of the circular convolution on those L elements would be small, and we wouldn't need to do the 2N zero-padding?

fakufaku commented 5 years ago

@maelp There is a way to save the first STFT computation.

The interpolation is done by

  1. setting all the odd weights to zero
  2. perform iFFT of size 2N
  3. setting to zero the second half of the filter
  4. doing FFT to go back to frequency domain where the odd frequencies will have been interpolated

The only extra step needed compared to a "conventional" setup is the last point which only costs 2 FFT (very small compared to cost of STFT).

This is indeed if you want to do perfect, efficient overlap-add filtering. Now, as you mention above this is rarely done with most implementation using half overlap with a Hann or Hamming window. If the filters are not too wild, the artefacts from the circular convolution will not be perceptible.

maelp commented 5 years ago

Thanks!

fakufaku commented 5 years ago

@maelp Ah, forgot the detail that when omitting zero-padding and relying on overlapping frames and windowing, it is important to use a synthesis window matching the analysis one. We recently added a function to compute the optimal synthesis window given the analysis one.

maelp commented 5 years ago

I guess this is the Griffin-Lim normalization?

maelp commented 5 years ago

Isn't this more linked to the fact that we are modifying the spectrum, so we want to get the best L2 approximation of a signal matching the modified spectrum (so we don't get a complex signal when inverting?)

fakufaku commented 5 years ago

This is indeed the Griffin-Lim criterion. It is almost as you say: you are trying to find the time-domain signal whose STFT will be closest to your (modified) STFT measurements. This is most useful when the modifications are non-linear (like artefacts from the circular convolution). If the transformation is linear (like filtering is), then it is still possible to get perfect reconstruction without the need of a synthesis window. (Note that the inverted signal will never be complex if you impose conjugate symmetry of the spectrum)