google-research-datasets / QED

QED: A Framework and Dataset for Explanations in Question Answering
114 stars 18 forks source link

coreference annotation is not a word #3

Open changzhisun opened 4 years ago

changzhisun commented 4 years ago

I found the coreference annotation ("question_reference") may not be words. For example, the question_text is "who does chris griffin's voice on family guy", but the question_reference is "chris griffin". In my preprocessing,"griffins‘s" is a word. If my model find it, is this correct in the evaluation?

calberti commented 4 years ago

Note that in the eval script by default we soft match spans (90% character F1). This should result in the prediction "chris griffin" being considered correct with the reference "chris griffin's".

changzhisun commented 4 years ago

Note that in the eval script by default we soft match spans (90% character F1). This should result in the prediction "chris griffin" being considered correct with the reference "chris griffin's".

I checked the qed_eval.py, and found the following code

if pred_entity.normalized_text == annot_entity.normalized_text:
    if overlap(pred_entity, annot_entity):
        found = True
        break
if pred_q_ent.normalized_text == annot_q_ent.normalized_text:
    if annot_doc_ent.normalized_text == pred_doc_ent.normalized_text:
        if overlap(pred_q_ent, annot_q_ent):
            if overlap(pred_doc_ent, annot_doc_ent):
                found = True
                break

Normalized text of "chris griffin" and "chris griffin's" are not equal ("chrisgriffin" and "chrisgriffins"). Can it be changed to the following ?

if (annot_entity.normalized_text in pred_entity.normalized_text)  or (pred_entity.normalized_text in annot_entity.normalized_text):

Or delete the if pred_entity.normalized_text == annot_entity.normalized_text: directly?