audeering / audtorch

Utils and data sets for audio and PyTorch
https://audeering.github.io/audtorch/
Other
83 stars 9 forks source link

`MaskSpectrogramFrequency` and `MaskSpectrogramTime`'s `axis` values are wrong #58

Closed phtephanx closed 4 years ago

phtephanx commented 4 years ago

Bug

axis-value for MaskSpectrogramTime (axis=1) and MaskSpectrogramFrequency (axis=0) need to be shifted by +1 (because of the channel dimension of the signal).

Why?

audtorch.datasets.utils.load returns shape:

`**numpy.ndarray**: two-dimensional array with shape
              `(channels, samples)`

A spectrogram with

spec = Spectrogram(320, 160)(signal)

has shape (C, F, S)

Thus: Both axis values needed to be increased by 1.

Steps to reproduce

from audtorch.datasets import LibriSpeech
from audtorch.transforms import Compose, Spectrogram, MaskSpectrogramFrequency

root = ''  # TODO
data = LibriSpeech(root=root, sets='dev-clean', transform=Compose([Spectrogram(320, 160), MaskSpectrogramFrequency(0.1)]))
data[0][0]
# error