Closed markusMM closed 2 years ago
It's because of the two input channels. With a single input channel, it does work as expected.
import torch
from asteroid.models import ConvTasNet
from asteroid.dsp import LambdaOverlapAdd
nnet = ConvTasNet(n_src=2)
continuous_nnet = LambdaOverlapAdd(
nnet=nnet, # function to apply to each segment.
n_src=2, # number of sources in the output of nnet
window_size=64000, # Size of segmenting window
hop_size=None, # segmentation hop size
window="hanning", # Type of the window (see scipy.signal.get_window
reorder_chunks=True, # Whether to reorder each consecutive segment.
enable_grad=False, # Set gradient calculation on of off (see torch.set_grad_enabled)
)
# This does not work
# u = torch.randn(1, 2, 128000* 8)
# continuous_nnet.forward(u)
# This works
u = torch.randn(1, 1, 128000*8)
continuous_nnet.forward(u)
Does that solve the issue?
Unexpectingly no! :laughing:
Generally, I wonder if that or all of the source separation networks are made for only one or also multiple input channels!
Maybe I should have asked / researched this first ^^!
Is there a way it could be adapted to also work for 2 -> K
channels?
Cheers
Some models work on multichannel data, but need to be designed so.
You can always use a monochannel for multichannel data, the simplest way here is to apply the model to both channels independently. Just put the channels instead of the batch dimension and it'll work.
Heya,
The model seems to only work mono-channel....
n_src = 2
nnet = ConvTasNet(
n_src=n_src, in_channels=1
)
continuous_nnet = LambdaOverlapAdd(
nnet=nnet, # function to apply to each segment.
n_src=n_src, # number of sources in the output of nnet
window_size=64000, # Size of segmenting window
hop_size=None, # segmentation hop size
window="hanning", # Type of the window (see scipy.signal.get_window
reorder_chunks=True, # Whether to reorder each consecutive segment.
enable_grad=False, # Set gradient calculation on of off (see torch.set_grad_enabled)
)
new_mp3 = continuous_nnet.forward(mp3.permute(1,0,2))
Cheers
https://github.com/asteroid-team/asteroid/blob/d07a9077bfc95c4e9bcd9a35cc16815a562fcf6f/asteroid/dsp/overlap_add.py#L116
I have loded an mp3 into a 1x2x147250253 tensor and get an error back about the segment size beeing just half as long as the window length (see below).
IDK, if I did something wrong or how I have to interpret this?