Questions about evaluation indicators

zsw111-zzz commented 1 year ago

Thank you very much for your outstanding contribution to the open source community, but I noticed that your evaluation index is different from the paper that proposed the Clotho-AQA dataset, namely Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering. They claim that they have achieved an accuracy rate higher than 0.6. I don't know if this is because this article mixes the "yes" and "no" binary labels with other multi-dimensional labels. I hope the author can further explain, Thanks again for your contribution!

ayameyao commented 1 year ago

Thank you very much for your interest in AQA-related work.

We have noticed the issue of being unable to reproduce the results of individual word predictions in the Clotho AQA paper [1]. Considering the limited nature of the Clotho-AQA dataset and the presence of 828 candidate answers, it is indeed challenging to achieve the reported top-1 result of 54.2% in [1].

Since the audio data in Clotho-AQA is sourced from the real world, we aim to explore sound scene understanding based on natural sounds. Therefore, we examined the original annotation files of Clotho-AQA and found that the official open-source annotations were not cleansed, resulting in discrepancies where different annotators provided different answers for the same question. As a result, we performed a simple filtering process where we considered a question to have the correct answer if it had at least two identical answers, disregarding other cases and excluding "yes" and "no" scenarios. Based on this filtering process, we obtained a new and more accurate annotation file and replicated the AqualNet method from [1], achieving a top-1 result of 14.78%.

Additionally, we have uploaded the filtered annotation file to the github repo. If you have any questions, please feel free to contact us via email.

[1] https://arxiv.org/pdf/2204.09634.pdf

zsw111-zzz commented 1 year ago

Your reply perfectly solved my problem, thank you very much for your reply！

GeWu-Lab / MWAFM

Questions about evaluation indicators #1