The sizeof X in locate_sources

Ryuk17 commented 3 years ago

Thanks for your sharing codes. I feel confused when I meet the description of X in the function of locate_sources. The comment says that

        :param X: Set of signals in the frequency (RFFT) domain for current
        frame. Size should be M x F x S, where M should correspond to the
        number of microphones, F to nfft/2+1, and S to the number of snapshots
        (user-defined). It is recommended to have S >> M.

For example, I have a 4 channel with 256 snapshot, then the input size of micrphone array is (4, 256). If I do fft with 128 point to this, the shape will be (4, 128). So I don't know the meaning of the S in M x F x S.

fakufaku commented 3 years ago

Hi @Ryuk17 , thanks for your interest. Here, the input signal is the STFT of the different channels packed into a single array. If you have a single channel with n_samples, then, an STFT with fft length of n_fft and shift length of n_shift will transform the single channel to a (n_fft // 2 + 1, n_samples // n_shift) array. In the size, the first number is the number of frequency bands, and the second is the number of frames (that we refer to as snapshots in the definition above). If in addition you have multiple channels, then the transformation is applied to each of the channel in parallel and you end up with a three dimensional array with the extra dimension being the number of channels.

Ryuk17 commented 3 years ago

I got it ! thanks for your explanation.

LCAV / FRIDA

The sizeof X in locate_sources #6