amazon-science / wqa_tanda

This repo provides code and data used in our TANDA paper.
Other
108 stars 26 forks source link

found no MAP & MRR metrics in the code. #8

Closed ryanpram closed 3 years ago

ryanpram commented 3 years ago

Hi,

How you get the MAP and MRR score as the paper reported? The metric provided in the code only simple accuracy.

Thanks

liudonglei commented 3 years ago

I find the same problem. It seems that this code is not completed, it lacks of some preprocessing code (extracting the dataset examples setting the metric of output) for task Wikicorpus and Trec-QA. I find similar preprocessing code in the repo https://github.com/tahmedge/CETE-LREC/blob/master/CETE%20Fine-Tuning/HuggingFacePytorchTransformer/examples/utils_glue.py#L390 maybe it can be reused here.

ryanpram commented 3 years ago

@liudonglei hi thank a lot for your reply. i see, i'll try it later

sid7954 commented 3 years ago

Hi @ryanpram and @liudonglei !

1) We have written in the README that our patch can be used with any target dataset (e.g. Wiki-QA or TREC-QA) as long as it is formatted similar to ASNQ where in a single .tsv file contains the <Question> <TAB> <Candidate> <TAB> <Label> per line of the file. Additional DataProcessors can be added for different input formats.

2) For computing MAP and MRR, you can use the following two functions which take as input the list of questions, list of labels and list of predictions (Note that questions has repeated question entries for each answer candidate):

'''
questions : list of questions in the dataset
answers : list of answers in the dataset
labels : list of 0/1 labels corresponding to if answer is correct for question
predictions : list of probability scores from the a QA model for question-answer pairs
'''

def mean_average_precision(questions, labels, predictions):
    question_results = {}
    #Aggregating (prediction, label) tuples specific to a question from all answer candidates
    for row in zip(questions, predictions, labels):
        if row[0] not in question_results:
            question_results[row[0]] = []
        question_results[row[0]].append((row[1], row[2]))

    sum_AP = 0.0
    for q in question_results:
        _scores, _labels = zip(*sorted(question_results[q], reverse=True))

        if sum(_labels) == 0: continue #All incorrect answers for a question
        if len(_labels) == 0: continue #No candidate answer for a question 
        if len(_labels) == sum(_labels): continue #All correct answers for a question

        sum_question_AP_at_k = num_correct_at_k = position=0
        while position < len(_labels):
            correct_or_incorrect = (_labels[position]==1)
            num_correct_at_k += correct_or_incorrect
            sum_question_AP_at_k += correct_or_incorrect * num_correct_at_k /(position+1)
            position+=1

        sum_AP+=(sum_question_AP_at_k/num_correct_at_k)

    MAP = sum_AP/len(question_results)
    return MAP

def mean_reciprocal_rank(questions, labels, predictions):
    question_results = {}
    #Aggregating (prediction, label) tuples specific to a question from all answer candidates
    for row in zip(questions, predictions, labels):
        if row[0] not in question_results:
            question_results[row[0]] = []
        question_results[row[0]].append((row[1], row[2]))

    reciprocal_ranks = []
    sum_RR = 0.0
    for q in question_results:
        _scores, _labels = zip(*sorted(question_results[q], reverse=True))

        if sum(_labels) == 0: continue #All incorrect answers for a question
        if len(_labels) == 0: continue #No candidate answer for a question 
        if len(_labels) == sum(_labels): continue #All correct answers for a question

        for idx, label in enumerate(_labels, 1):
            if label == True:
                sum_RR+=1.0/idx
                break

    MRR =  sum_RR/len(question_results)
    return MRR
cxx-cindy commented 3 years ago

@liudonglei hi thank a lot for your reply. i see, i'll try it later

请问您能把这个代码跑通吗?