YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
357 stars 58 forks source link

A question about ESC's AC #12

Closed liyunlongaaa closed 1 year ago

liyunlongaaa commented 1 year ago

学长您好,在论文中的ESC的指标为什么是80多,在AST那篇文章不是都作到95了嘛

Why the ESC accuracy score is ~80% in the SSAST paper while ~95% was reported in the AST paper?

YuanGongND commented 1 year ago

Hi there,

I think the main difference between the setting is if supervised AudioSet pretraining is applied. As AudioSet and ESC-50 are very close datasets and even share some classes, supervised AudioSet pretraining usually makes a big difference.

More specifically,

In the AST paper ImageNet supervised pretraining = 88.7 ImageNet supervised pretraining + AudioSet supervised pretraining = 94.7

In the SSAST paper AudioSet + Librispeech self-supervised pretraining = 88.8

Hope this helps.

-Yuan

liyunlongaaa commented 1 year ago

Thank you a lot!