audeering / audtorch

Utils and data sets for audio and PyTorch
https://audeering.github.io/audtorch/
Other
83 stars 9 forks source link

Fix #58 #59

Closed phtephanx closed 5 years ago

phtephanx commented 5 years ago

Summary

Fixes #58

Proposed Changes

Increase axis values of both transforms.MaskSpectrogramFrequency and transforms.MaskSpectrogramTime by +1

Code

from librosa.display import specshow
import matplotlib.pyplot as plt
from audtorch.datasets import LibriSpeech
from audtorch.transforms import Compose, Spectrogram, MaskSpectrogramFrequency

root = ''  # TODO
data = LibriSpeech(root=root, sets='dev-clean', transform=Compose([Spectrogram(320, 160), MaskSpectrogramFrequency(0.05)]))
magnitude = data[0][0].squeeze().numpy()
specshow(np.log10(np.abs(magnitude) + 1e-4))
plt.show()

Also try out with MaskSpectrogramTime

hagenw commented 5 years ago

Thanks for fixing this.

The safest way to handle these issues is indeed to use axis=-1 for time and axis=-2 for frequency as we always assume those are the very last dimensions, but the number before can vary, e.g. channeles, batches.