YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.13k stars 212 forks source link

training with custom data #40

Closed ibrahimrazu closed 2 years ago

ibrahimrazu commented 2 years ago

Thanks a lot for your amazing work and sharing the code. I had a little question. As I have a video dataset, wanna use it by extracting audios from the videos. Do you recommend any recipe for processing the audio data from the videos? or any raw mp3 would work

Thanks again

YuanGongND commented 2 years ago

Hi there,

To extract audios from videos, I use the ffmpeg tool. You can also use other similar tools.

I think mp3 should also work without any change of the code as torchaudio.load accepts mp3.

But the code was tested on audios of 16kHz sampling rate, so it is better to convert your audio to 16kHz. To check the sampling rate, you can use sox --i youraudio.mp3.

-Yuan

ibrahimrazu commented 2 years ago

thanks a lot for your reply. it helps a lot.