Description of changes: This pull request adds the Precision and Recall metrics for the Question Answering task. Previous metrics do not capture the cases where one of the target output or model output is short and the other one is long.
For instance, consider the question Did RMS Titanic sink in 1912? If the target output is Yes and the model output is Yes. The ship indeed sank in 1912. It was the largest ship at the time <some long text> then the existing metrics will give a low score even though the answer is correct. The recall metric added in this PR will be 1.0 indicating that all of the target output words are contained within the model output. The precision metric operates in the opposite direction and measures what fraction of words in the model output are found in the target output.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Description of changes: This pull request adds the Precision and Recall metrics for the Question Answering task. Previous metrics do not capture the cases where one of the target output or model output is short and the other one is long.
For instance, consider the question
Did RMS Titanic sink in 1912?
If the target output isYes
and the model output isYes. The ship indeed sank in 1912. It was the largest ship at the time <some long text>
then the existing metrics will give a low score even though the answer is correct. The recall metric added in this PR will be1.0
indicating that all of the target output words are contained within the model output. The precision metric operates in the opposite direction and measures what fraction of words in the model output are found in the target output.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.