Closed jasonppy closed 2 months ago
Non-binary means the question isn't a yes-no question. You can get those questions by looking at whether all the ground truth answers are in ['yes', 'no'].
Numerical means the question is asking for a number. You can get those questions by looking at whether all the ground truth answers are numbers.
Thanks! For ClothoAQA, one question is answered by multiple workers and they might give different answers. For example, for question: "Is the area dry?", there might be 3 answers: "yes", "yes", "no". How was this handled in the evaluation? - do we treat them as three different QAs?
The unanimous subset is where all three answers are the same (yes,yes,yes, or no,no,no). For the numerical and non-binary subsets, we call an answer correct if it hits any of the three answers.
Thanks! for numerical subset, is it possible to share the metadata or dataset split script? As there are some non-standard answers that might introduce ambiguities in dataset splitting. For example, answer "twentyfive" might not get classified as number, or whether "once" should be counted as a number
currently, if we only count those where all answers can be parsed by word2number, there are 138 examples in the test set.
And just to make sure our other metadata aligned, for unanimous, there are 1312 examples in test set, for non-binary there are 946 examples in test set.
Thanks for your time!
The numerical subset are those questions that start with "how many". There are 195 such questions in the test split of Clotho-AQA. There are 684 unanimous-yes questions and 425 unanimous-no questions. There are 932 non-binary questions.
Thanks!
I was able to get 195 numerical QA following your comments. and 1109 unanimous yes and no questions. However, I got 945 non-binary questions rather than 932. I have manually checked all the QA in non-binary testsets and they are indeed non-binary. Is there are criteria that you used in addition to needing all answers from different workers to be not 'yes' nor 'no'?
Use ps.stem to deal with typos.
In table 2, clothoAQA is categorized into 3 subsets, however it's not clear to me how are non-binary and numerical split constructed, as for the same question, different annotators can give different answers. Say the same question got 3 different answers, do you merge them into one QA, or treat them as 3 different QA
Thanks