Closed carriex closed 3 years ago
Hi @carriex,
thanks a lot for your message. The meta field in the ELI5 dev set contains exhaustive information about our annotation campaign, including the span of text that was highlighted by the annotators. Here the full guidelines for the annotation campaign:
I hope this helps!
Hi @fabiopetroni,
thanks for providing the detailed explanation for how the annotation campaign was carried out! I have two follow-up clarification questions:
For the field containing exhausive information of the annotation campaign, are you referring to the meta
field in the below example, which includes a partial evidence? Also it looks like the partial evidence comes from a different wikipedia page than the one in output/provenance
. However from the eval_retrieval.py it looks like we are still using the wikipedia page in output/provenance
to evaluate the retrieval performance. Is this understanding correct?
{'id': '1kiwfx', 'input': 'In Trading Places (1983, Akroyd/Murphy) how does the scheme at the end of the movie work? Why would buying a lot of OJ at a high price ruin the Duke Brothers?', 'meta': {'left_context': '', 'mention': '', 'obj_surface': {'text': array([], dtype=object)}, 'partial_evidence': {'end_paragraph_id': array([7], dtype=int32), 'meta': array([{'evidence_span': array(['On television, they learn that Clarence Beeks is transporting a secret USDA report on orange crop forecasts.', 'On television, they learn that Clarence Beeks is transporting a secret USDA report on orange crop forecasts. Winthorpe and Valentine recall large payments made to Beeks by the Dukes and realize that the Dukes plan to obtain the report to corner the market on frozen orange juice.', 'Winthorpe and Valentine recall large payments made to Beeks by the Dukes and realize that the Dukes plan to obtain the report to corner the market on frozen orange juice.'], dtype=object)} ], dtype=object), 'section': array(['Section::::Plot.\n'], dtype=object), 'start_paragraph_id': array([7], dtype=int32), 'title': array(['Trading Places'], dtype=object), 'wikipedia_id': array(['520990'], dtype=object)}, 'right_context': '', 'sub_surface': {'text': array([], dtype=object)}, 'subj_aliases': {'text': array([], dtype=object)}, 'template_questions': {'text': array([], dtype=object)}}, 'output': {'answer': array(['The final scene involves future contracts. ..."what happens at the end of Trading Places?"', ''], dtype=object), 'meta': array([], dtype=object), 'provenance': array([{'bleu_score': array([0.92328084], dtype=float32), 'end_character': array([612], dtype=int32), 'end_paragraph_id': array([1], dtype=int32), 'meta': array([], dtype=object), 'section': array(['Section::::Abstract.'], dtype=object), 'start_character': array([14], dtype=int32), 'start_paragraph_id': array([1], dtype=int32), 'title': array(['Futures contract'], dtype=object), 'wikipedia_id': array(['242855'], dtype=object)}], dtype=object)}
My original question is actually around the meta
field inside output/provenance
. For example in the instance below (bolded), there is such a field. However for the example above, the annotation only contains start/end character/paragraph of the wikipedia page. I'm wondering what is the difference between these two kinds of annotations?
{'id': '3atjp2', 'input': 'what are benefits of TPP ?', 'meta': {'left_context': '', 'mention': '', 'obj_surface': {'text': array([], dtype=object)}, 'partial_evidence': {'end_paragraph_id': array([], dtype=int32), 'meta': array([], dtype=object), 'section': array([], dtype=object), 'start_paragraph_id': array([], dtype=int32), 'title': array([], dtype=object), 'wikipedia_id': array([], dtype=object)}, 'right_context': '', 'sub_surface': {'text': array([], dtype=object)}, 'subj_aliases': {'text': array([], dtype=object)}, 'template_questions': {'text': array([], dtype=object)}}, 'output': {'answer': array(['The TPP is a trade liberalization treaty...why would FR/UK/NZ etc. want to sign it France and the UK are not part of TPP. That's TTIP, a similar but separate deal.", ''], dtype=object), 'meta': array([], dtype=object), 'provenance': array([{'bleu_score': array([0.], dtype=float32), 'end_character': array([-1], dtype=int32), 'end_paragraph_id': array([1], dtype=int32), 'meta': array([{'annotation_id': '-1', 'evidence_span': {'text': array(['Theory of Motivated Information Management or TMIM, is a social-psychological framework that examines the relationship between information management and uncertainty. The theory posits that individuals are motivated to manage their uncertainty levels when they perceive a discrepancy between the level of uncertainty they have about an important issue and the level of uncertainty they want. In other words, someone may be uncertain about an important issue but decides not to engage or seek information because they are comfortable with that state.\rhighlight sentence(s) containing evidence, not only the answer', 'Theory of Motivated Information Management or TMIM, is a social-psychological framework that examines the relationship between information management and uncertainty. The theory posits that individuals are motivated to manage their uncertainty levels when they perceive a discrepancy between the level of uncertainty they have about an important issue and the level of uncertainty they want. In other words, someone may be uncertain about an important issue but decides not to engage or seek information because they are comfortable with that state.'], dtype=object)}, 'fever_page_id': '', 'fever_sentence_id': -1, 'yes_no_answer': ''} ], dtype=object), 'section': array(['Section::::Abstract.'], dtype=object), 'start_character': array([-1], dtype=int32), 'start_paragraph_id': array([1], dtype=int32), 'title': array(['Theory of Motivated Information Management'], dtype=object), 'wikipedia_id': array(['36119336'], dtype=object)} ], dtype=object)}}
Again, thanks very much for your help!
thanks for getting back so quickly! The answer for the first question makes sense to me.
For the second question, I'm referring to the fields inside output/provenance
. For the two examples above, the question with id 1kiwfx
doesn't contain an evidence span in output/provenance
, but only start_character
, end_character
, start_paragraph
and end_paragraph
. However for the question with id 3atjp2
, there are two evidence spans inside output/provenance
, whereas start_character
and end_character
contains value -1
. Does it mean the annotator highlighted an evidence span for 3atjp2
, but only selected "Yes, sufficient to answer" for the passage in 1kiwfx
(without highlighting any evidence span)?
Sorry for any confusion caused, let me know if this is clear to you!
I see. So sometimes the evidence span is given as offset in a paragraph within the knowledge source, sometimes as a string (probably there the automatic script failed). I hope this helps :)
@fabiopetroni got it! thank you so much for your help! :)
Hi, thanks for creating the dataset! I have two questions regarding to the annotated ELI5 data:
output/provenance/meta
contains task-specific data. What does this meta field contain for the ELI5 dataset? Are these the manual annotation for instances with low overlap between passages and answers mentioned in section 4? If so, could you share a bit more on how is this manual annotation performed?output/provenance/meta/evidence_span
contain a string "highlight sentence(s) containing evidence, not only the answer" at the end, (for instance, data in ELI5 dev set with id "3atjp2", as well as "49wqfo"). What are the meanings of such string?Thank you!