Closed rogercmq closed 3 years ago
Hi @rogercmq,
For the audio encoder, we use a ResNet-18 with MelSpectoorgrams as the input. We do not plan on releasing any preprocessing scripts for the audio, but we recommend using the publicly available torchaudio
package. In particular, you can construct the MelSpectograms used by XDC following similar steps as in this torchaudio
tutorial. We detail the audio preprocessing parameters (e.g. the number of Mel filters) in the XDC paper.
Cheers, Humam
I am wondering how to pretrain our r(2+1)d networks on AudioSet. Would scripts on preprocessing audio files be available?