KinWaiCheuk / nnAudio

Audio processing by using pytorch 1D convolution network
MIT License
1.01k stars 89 forks source link

inverse transform from logscale to linear scale stft #7

Open adrienchaton opened 4 years ago

adrienchaton commented 4 years ago

Hi !

Your repo is a pretty awesome find, I am especially interested in using the stft in log frequency. Mel operations I was already doing myself using torch.stft and librosa filterbanks, but the more .. the better to experiment with.

May I ask, is there any way to transform a stft computed on log frequency scale back to linear frequency scale please ?

The use case I consider is putting some waveforms into log frequency spectrograms, filtering it and then putting back to linear frequency to then use the inverse stft back to time domain.

Thanks !

KinWaiCheuk commented 4 years ago

I need to double check first. Unfortunately it seems log frequency scale is not invertible mathematically. Only the STFT with the original frequency bin (k) scale is invertible, the moment you change the scale of the frequency bin spacing, the Fourier basis vectors are no longer orthogonal. That is why I think it would not be invertible. But I need further investigate and discussion with other people who is more familiar with it before I can assert whether it is doable or not. It is a cool suggestion, I hope I can implement it if it is possible.

adrienchaton commented 4 years ago

Thank you for discussion !

When I meant invertible, I did mean from STFT to STFT, across frequency scales. Indeed it would mean then being able to take a log frequency STFT, put it back to linear scale and then use the inverse transform of the linear scale back to time domain.

I had in mind maybe some kind of transpose (like when using a mel filter bank), possibly with some approximate.

But I am either an expert with that .. interested to hear about your possible finds on it !

mpariente commented 4 years ago

It is not invertible but the pseudo-inverse of the forward transform matrix is the way to go. See the back-propagable pseudo-inverse here. Same goes for the mel-spectrogram by the way.

adrienchaton commented 4 years ago

Thank you for pointing the torch.pinverse operator that I didn't know !

I seems straight-forward for inverting the mel-spectrogram, however since nnAudio computes STFT through 1d convolution kernels, I am not sure if that applies as well for inverting the log scale to the linear scale .. or computing the log scale through a matrix multiplication similar to mels and using the pseudo-inverse of this matrix ?

mpariente commented 4 years ago

Ah actually it's a bit different than for the mel-spectrogram, you're right. I'd suggest taking the pseudo-inverse of the filterbank to invert the log-scaled transform directly, without going through linear scale STFT.

You can find some implementation about this in asteroid, where pseudo inverse can also be computed on the fly for each forward if you want learnable filters.