YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.06k stars 203 forks source link

AST Audioset Training Time and Hardware #116

Closed justinluong closed 8 months ago

justinluong commented 8 months ago

Hi,

I hope this message finds you well. I am currently working on my thesis and have found your work on AST invaluable.

I am using your pretrained model (Full AudioSet, 10 tstride, 10 fstride, with Weight Averaging) as a baseline for comparison.

Would you be able to provide some details on the hardware used to train this model, as well as the approximate training time? This information would be incredibly helpful for the context of my research.

Thank you very much for your time and assistance.

YuanGongND commented 8 months ago

hi there,

We also released the training log of that experiment at https://github.com/YuanGongND/ast/blob/master/egs/audioset/exp/test-full-f10-t10-pTrue-b12-lr1e-5/log_2090852.txt

All information can be found in the log.

Specifically, the GPU is 4 x NVIDIA TITAN X (12GB each GPU), an old GPU.

Training time per epoch = 118072 seconds, we trained 5 epochs, which is about 7 days. With modern GPUs, it will be faster.

I recommend checking the mAP immediately after the 1st epoch and seeing if it matches our log.

-Yuan

justinluong commented 8 months ago

Thank you!