Closed linmou closed 1 year ago
Hi there,
The goal of reconstruction loss here is just to force the model to learn a good audio representation. We didn't mean to make the model a strong reconstructor. But if you want to convert spectrogram back to waveforms, you will need a vocoder (not included in this repo).
-Yuan
Thanks for your warmly reply. Any vocoder recommend? I want to inverse fbank features to audios.
Hi there,
I am not familiar with vocoder - you can check the github list: https://github.com/topics/vocoder. Note most of these are for TTS (speech) rather than general audio.
-Yuan
Given that the fbank feature reconstructed by ssast is not so straight forward, how to transform it into pure audio data for further analysis ?