A question about ESC's AC

liyunlongaaa commented 1 year ago

学长您好，在论文中的ESC的指标为什么是80多，在AST那篇文章不是都作到95了嘛

Why the ESC accuracy score is ~80% in the SSAST paper while ~95% was reported in the AST paper?

YuanGongND commented 1 year ago

Hi there,

I think the main difference between the setting is if supervised AudioSet pretraining is applied. As AudioSet and ESC-50 are very close datasets and even share some classes, supervised AudioSet pretraining usually makes a big difference.

More specifically,

In the AST paper ImageNet supervised pretraining = 88.7 ImageNet supervised pretraining + AudioSet supervised pretraining = 94.7

In the SSAST paper AudioSet + Librispeech self-supervised pretraining = 88.8

Hope this helps.

-Yuan

liyunlongaaa commented 1 year ago

Thank you a lot!

YuanGongND / ssast

A question about ESC's AC #12