Closed mogwai closed 4 years ago
All input tensors are assumed to be three-dimensional at the moment. I.e. (batch_size, num_channels, num_samples)
. Did this help? Should it be better documented? I'm thinking of removing support for two-dimensional tensors btw. How do you feel about that? In that case, mono audio would be represented as (batch_size, 1, num_samples)
Interesting, this isn't working though? Am I supposed to wrap the augmentations somewhere else?
import torchaudio
from torch_audiomentations.augmentations.gain import Gain
wave,sr = torchaudio.load("./tests/data/dev01.wav")
# Create a batch of 32
wave = wave[None].repeat(32,1,1)
print(wave.shape)
# [32,1,410001]
res = Gain()(wave, sr)
Interesting. How does it fail? With an error message? Or is the output not as expected?
My mistake, this is only the case for BackgroundNoise
and AddImpulseResponse
Yeah, I haven't really finished and released those two yet. I'll get to it
Fair enough :)
So I have to set support_multichannel
= True to support batches?
AddBackgroundNoise
and AddImpulseResponse
should support batches, but I don't think they support multichannel audio yet.
I think they expect 2D tensors like (batch_size, num_samples)
for now
If I was creating a new BaseWaveTransform, e.g. Reverb, I'd have to set support_multichannel = True to make it work?
Yes. I might rework that later though
The
BaseWaveTransformation
s currently don't except a batch of audio in the forward as they think that a batch of audio is multi_channel audio. I can submit a PR or is there some design decision here that I'm missing?