Closed liyunlongaaa closed 1 year ago
Hi there,
I think the main difference between the setting is if supervised AudioSet pretraining is applied. As AudioSet and ESC-50 are very close datasets and even share some classes, supervised AudioSet pretraining usually makes a big difference.
More specifically,
In the AST paper ImageNet supervised pretraining = 88.7 ImageNet supervised pretraining + AudioSet supervised pretraining = 94.7
In the SSAST paper AudioSet + Librispeech self-supervised pretraining = 88.8
Hope this helps.
-Yuan
Thank you a lot!
学长您好,在论文中的ESC的指标为什么是80多,在AST那篇文章不是都作到95了嘛
Why the ESC accuracy score is ~80% in the SSAST paper while ~95% was reported in the AST paper?