Closed maelp closed 5 years ago
Hi @maelp , indeed, you will need to set the proper amount of zero-padding when doing frequency domain filtering. It might indeed be a good idea to set reasonable default values there. Also, we have updated the STFT implementation (the new one is in realtime
sub-package) and should probably fix the beamforming class to use this one.
Another thing you need to be careful is that your filter also needs to be zero padded when doing FD convolution. This might not be completely obvious when the beamforming weights are computed directly in the frequency domain.
Hi @fakufaku thanks for this answer! So if I understand correctly, the "correct way" to do FD filtering would be this:
this seems a bit computation-hungry though, and I never saw this done in any codebase I saw, so this is why I was wondering if this was the "correct way" to do it, or if there are reasons why someone can do the beamforming without zero-padding?
I was assuming that the justification would be that if we do hops of size L << N and we recover only the L central coefficients, and that the data has been windowed, then the influence of the circular convolution on those L elements would be small, and we wouldn't need to do the 2N zero-padding?
@maelp There is a way to save the first STFT computation.
The interpolation is done by
The only extra step needed compared to a "conventional" setup is the last point which only costs 2 FFT (very small compared to cost of STFT).
This is indeed if you want to do perfect, efficient overlap-add filtering. Now, as you mention above this is rarely done with most implementation using half overlap with a Hann or Hamming window. If the filters are not too wild, the artefacts from the circular convolution will not be perceptible.
Thanks!
@maelp Ah, forgot the detail that when omitting zero-padding and relying on overlapping frames and windowing, it is important to use a synthesis window matching the analysis one. We recently added a function to compute the optimal synthesis window given the analysis one.
I guess this is the Griffin-Lim normalization?
Isn't this more linked to the fact that we are modifying the spectrum, so we want to get the best L2 approximation of a signal matching the modified spectrum (so we don't get a complex signal when inverting?)
This is indeed the Griffin-Lim criterion. It is almost as you say: you are trying to find the time-domain signal whose STFT will be closest to your (modified) STFT measurements. This is most useful when the modifications are non-linear (like artefacts from the circular convolution). If the transformation is linear (like filtering is), then it is still possible to get perfect reconstruction without the need of a synthesis window. (Note that the inverted signal will never be complex if you impose conjugate symmetry of the spectrum)
Are you actually using zero-padding when computing the beamformer result? I saw that the default are set to 0, won't this lead to aliasing when using circular convolutions corresponding to the filter frequency-domain multiplication?