YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
365 stars 61 forks source link

How to convert fbank features back to audio ? #15

Closed linmou closed 1 year ago

linmou commented 2 years ago

Given that the fbank feature reconstructed by ssast is not so straight forward, how to transform it into pure audio data for further analysis ?

YuanGongND commented 2 years ago

Hi there,

The goal of reconstruction loss here is just to force the model to learn a good audio representation. We didn't mean to make the model a strong reconstructor. But if you want to convert spectrogram back to waveforms, you will need a vocoder (not included in this repo).

-Yuan

linmou commented 2 years ago

Thanks for your warmly reply. Any vocoder recommend? I want to inverse fbank features to audios.

YuanGongND commented 2 years ago

Hi there,

I am not familiar with vocoder - you can check the github list: https://github.com/topics/vocoder. Note most of these are for TTS (speech) rather than general audio.

-Yuan