HumamAlwassel / XDC

Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)
http://humamalwassel.com/publication/xdc/
MIT License
90 stars 9 forks source link

XDC pretrained on AudioSet #4

Closed rogercmq closed 3 years ago

rogercmq commented 3 years ago

I am wondering how to pretrain our r(2+1)d networks on AudioSet. Would scripts on preprocessing audio files be available?

HumamAlwassel commented 3 years ago

Hi @rogercmq,

For the audio encoder, we use a ResNet-18 with MelSpectoorgrams as the input. We do not plan on releasing any preprocessing scripts for the audio, but we recommend using the publicly available torchaudio package. In particular, you can construct the MelSpectograms used by XDC following similar steps as in this torchaudio tutorial. We detail the audio preprocessing parameters (e.g. the number of Mel filters) in the XDC paper.

Cheers, Humam