Closed mogwai closed 4 years ago
As far as I can tell, this upgrade will break the following
DownmixMono
is being removed, we currently use it although we are transitioning to accept multichannel audio. In the meantime DownmixMono
needs to be replaced with the recommended fix of taking the channel mean of the multichannel audio.
Spectrogram used to be of dimension (channel, time, freq) and is now (channel, freq, time). Similarly for MelScale, MelSpectrogram, and MFCC, time is the last dimension. This will flip our spectrograms to be rotated 90 degrees in a way we do not want. Likely fix is in altering (possibly removing) the call to permute
in open
method of data.py
. Testing will be necessary to make sure the upside down spectograms are not reintroduced.
SpectrogramToDB
has been renamed to AmplitudeToDB
so we need to change the call in open()
in data.py
Much testing will be needed, don't assume this list is comprehensive, it's just what I picked up reading down the list of breaking changes.
The Getting Started notebook will be the first to be fixed?
I don't believe any of the notebooks will need to be updated. Our API is staying the same, just the underlying calls to torchaudio and our interpretation of the spectrogram it passes back need to be altered. You should rerun the features nb and make sure nothing is broken as a result.
Other breaking changes:
pad will be changed to pad_length f_min will now need to be a float, ws either needs to be removed or changed to match the new name in torchaudio 0.3.0 (Not sure what this is)
after mel is created in open() function, run mel=mel.detach()
so that requires_grad = False and numpy can be used in later locations.
I believe it hasn't been mentioned here yet, but MelSpectrogram now accepts sample_rate
as an argument and defaults to 16000. I haven't looked closely at how this is actually used, but it's different from the past and we will probably need to reconfigure in order to pass sr into MelSpectrogram
torchaudio.transforms.MFCC used to accept sr as an argument, this has been changed to sample_rate in 0.3.0
It would be better to mention these things on the v2 thread now
Hey - I was looking for audio processing for fastai and I found your stuff to be the most complete. I needed it to work with fastai 1.059 and torchaudio 0.30, so made the updates you mentioned above and a couple more. Tested them by getting the 3 tutorial notebooks and 'getting started' notebook to work without issues. Removing permute did get time on the horizontal left to right, but vertical now has lower frequencies towards the top of the image. I am still doing some testing and will try to fix this before doing a pull request.
https://github.com/pytorch/audio/releases/tag/v0.3.0