asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers
https://asteroid-team.github.io/
MIT License
2.27k stars 423 forks source link

Unfolding sometimes results into concatenated channels #627

Open markusMM opened 2 years ago

markusMM commented 2 years ago

https://github.com/asteroid-team/asteroid/blob/c72227e5e31f6c13ba9c9da1d0d380cc75b91fbd/asteroid/dsp/overlap_add.py#L92

image /We can see how torch 1.10.2 does concatenates the windows of all channels after unfold./

The expected behavior, in the code, would be to handle (batch, chans, win_size) per chunk $\rightarrow$ (batch, chans, win_size, n_chunks).

Thus it has to be reshaped before handling to the NN, from my perspective. unfolded = unfolded.reshape(batch, channels, self.window_size, -1)

mpariente commented 2 years ago

Thanks for the issue, that's very informative ! Did you search in PyTorch's changelog if they note this change, or not ? Do you think it's intended behavior, or a bug. Has it been fixed in the new versions ?

markusMM commented 2 years ago

So, right now there are two simple ways of unfolding a tensor. The nn.Unfold (and its functional wrapper) will always do the behaviour above on the latest versions (since v0.4.1) And the builtin torch.tensor.unfold, which always unfolds a specified dimension and outputs size(..., nWindows, winSize). This seems to be the better solution to avoid the reshape: unfolded = frame.unfold(-1, stride, window_size)

cheers

mpariente commented 2 years ago

Thanks for the explanation @markusMM

Could you submit a PR to fix the problem please ? :upside_down_face: Thanks in advance !