Closed iver56 closed 3 years ago
Hi amiasato. Cool that you want to contribute! I'm looking forward to it.
Yes :) BaseSpectrogramTransform is a welcome addition. If we consider magnitude spectrograms and complex spectrograms two different beasts, maybe we can call them BaseMagnitudeSpectrogramTransform and BaseComplexSpectrogramTransform? Thinking of it, most spectrogram transforms should ideally support both magnitude and complex input. Some transforms (like frequency mask à la SpecAugment) would leave the phase of any complex spectrogram untouched though. So maybe we should have a single base class, like BaseSpectrogramTransform, and each transform can have a flag like supports_complex_spectrograms
and supports_magnitude_spectrograms
? And it could check the format of the input to determine whether it's complex or not?
The only problem I see with the auto-inference of the input type is that we can't actually know if a complex input is in the linear scale (real/imag) or in the log scale (mag/phase). Torchaudio's "Spectrogram" transform has an API which returns linear scale when "power" is None, so I guess we may assume the former most of the time. Using two different classes seem like overkill to me, the flags alternative indeed sounds nicer.
In any case, I am still wrapping my head around the multiple torch APIs for short-time transforms (stft and istft are simpler alternatives), so for starters I guess I will use those for this issue in particular.
Hi, I'm willing to contribute to the project.
Aside from white noise, all other types are easier to implement in spectral space. Since other augmentations (e.g. pitch shift) should benefit from spectral manipulation, i guess something like the BaseSpectrogramTransform could be implemented. However, the mentioned interface works only with magnitudes, whilst I believe the best course would be to use the complex spectrum which allows for perfect signal reconstruction. Do you have any thoughts on these issues?