fastaudio / fastai_audio

[DEPRECATED] 🔊️ Audio with fastaiv1
MIT License
160 stars 49 forks source link

Upgrade to torchaudio v0.3.0 #37

Closed mogwai closed 4 years ago

mogwai commented 5 years ago

https://github.com/pytorch/audio/releases/tag/v0.3.0

rbracco commented 5 years ago

As far as I can tell, this upgrade will break the following

  1. DownmixMono is being removed, we currently use it although we are transitioning to accept multichannel audio. In the meantime DownmixMono needs to be replaced with the recommended fix of taking the channel mean of the multichannel audio.

  2. Spectrogram used to be of dimension (channel, time, freq) and is now (channel, freq, time). Similarly for MelScale, MelSpectrogram, and MFCC, time is the last dimension. This will flip our spectrograms to be rotated 90 degrees in a way we do not want. Likely fix is in altering (possibly removing) the call to permute in open method of data.py. Testing will be necessary to make sure the upside down spectograms are not reintroduced.

  3. SpectrogramToDB has been renamed to AmplitudeToDB so we need to change the call in open() in data.py

Much testing will be needed, don't assume this list is comprehensive, it's just what I picked up reading down the list of breaking changes.

mogwai commented 5 years ago

The Getting Started notebook will be the first to be fixed?

rbracco commented 5 years ago

I don't believe any of the notebooks will need to be updated. Our API is staying the same, just the underlying calls to torchaudio and our interpretation of the spectrogram it passes back need to be altered. You should rerun the features nb and make sure nothing is broken as a result.

kevinbird15 commented 5 years ago

Other breaking changes:

pad will be changed to pad_length f_min will now need to be a float, ws either needs to be removed or changed to match the new name in torchaudio 0.3.0 (Not sure what this is)

after mel is created in open() function, run mel=mel.detach() so that requires_grad = False and numpy can be used in later locations.

rbracco commented 5 years ago

I believe it hasn't been mentioned here yet, but MelSpectrogram now accepts sample_rate as an argument and defaults to 16000. I haven't looked closely at how this is actually used, but it's different from the past and we will probably need to reconfigure in order to pass sr into MelSpectrogram

rbracco commented 5 years ago

torchaudio.transforms.MFCC used to accept sr as an argument, this has been changed to sample_rate in 0.3.0

mogwai commented 5 years ago

It would be better to mention these things on the v2 thread now

filipmu commented 4 years ago

Hey - I was looking for audio processing for fastai and I found your stuff to be the most complete. I needed it to work with fastai 1.059 and torchaudio 0.30, so made the updates you mentioned above and a couple more. Tested them by getting the 3 tutorial notebooks and 'getting started' notebook to work without issues. Removing permute did get time on the horizontal left to right, but vertical now has lower frequencies towards the top of the image. I am still doing some testing and will try to fix this before doing a pull request.