aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
155 stars 40 forks source link

feat: added the precision and recall metrics for QA accuracy #157

Closed bilalaws closed 5 months ago

bilalaws commented 6 months ago

Description of changes: This pull request adds the Precision and Recall metrics for the Question Answering task. Previous metrics do not capture the cases where one of the target output or model output is short and the other one is long.

For instance, consider the question Did RMS Titanic sink in 1912? If the target output is Yes and the model output is Yes. The ship indeed sank in 1912. It was the largest ship at the time <some long text> then the existing metrics will give a low score even though the answer is correct. The recall metric added in this PR will be 1.0 indicating that all of the target output words are contained within the model output. The precision metric operates in the opposite direction and measures what fraction of words in the model output are found in the target output.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.