Closed Ryuk17 closed 3 years ago
Hi @Ryuk17 , thanks for your interest. Here, the input signal is the STFT of the different channels packed into a single array.
If you have a single channel with n_samples
, then, an STFT with fft length of n_fft
and shift length of n_shift
will transform the single channel to a (n_fft // 2 + 1, n_samples // n_shift)
array. In the size, the first number is the number of frequency bands, and the second is the number of frames (that we refer to as snapshots in the definition above). If in addition you have multiple channels, then the transformation is applied to each of the channel in parallel and you end up with a three dimensional array with the extra dimension being the number of channels.
I got it ! thanks for your explanation.
Thanks for your sharing codes. I feel confused when I meet the description of X in the function of locate_sources. The comment says that
For example, I have a 4 channel with 256 snapshot, then the input size of micrphone array is (4, 256). If I do fft with 128 point to this, the shape will be (4, 128). So I don't know the meaning of the S in M x F x S.