YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.17k stars 221 forks source link

ast input audio length #99

Closed syjunghwang closed 1 year ago

syjunghwang commented 1 year ago

When the audio is put into the model for 30 seconds and the sampling rate is 16,000, the filter bank's shape[0] is over 20,000. However, in this code, you set the length(input t dim) to 1024 for 10 second audio and put it in, what is correct?