Swapping Mel spectrogram with CQT spectrogram

descriptinc / melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

MIT License

978 stars 214 forks source link

Swapping Mel spectrogram with CQT spectrogram #16

Closed mcallistertyler closed 4 years ago

mcallistertyler commented 4 years ago

There has been some research indicating that CQT spectrogram can be better suited to music data. Do you think it would be possible to change the Mel spectrogram generation to CQT without making many changes to the underlying model? This is something I am interested in attempting.

tdeboissiere commented 4 years ago

@mcallistertyler95 The only thing you would need to change for the model to actually run is the input size (i.e our mel spectrogram has 80 channels, you may have a different number of channels for CQT).

However, we did not try training with CQT spectrogram so we cannot guarantee that it will actually converge to good results.

mcallistertyler commented 4 years ago

Thanks! I'm not sure if I should open a new issue for this seeing as this one is still open, but could I ask the purpose of the lines 55 - 56 in modules.py?

p = (self.n_fft - self.hop_length) // 2 audio = F.pad(audio, (p, p), "reflect").squeeze(1)