aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
187 stars 42 forks source link

feat: add `quasi_exact_inclusion` metric to factual knowledge; change `factual_knowledge` score name to `exact_inclusion` #302

Closed kirupang-code closed 2 months ago

kirupang-code commented 3 months ago

Description of changes: I added a new metric to factual knowledge called "quasi_exact_inclusion" which tests checks whether the target output is included in the model output after both outputs are normalized. Since both qa_accuracy and factual_knowledge.py had similar logic in their normalization functions, I moved this to util and imported the function from there. I updated the EvalScores and outputs so they would output both metrics. I also updated the unit tests (to check for both metrics) and added more tests to check the second metrics output. Then, I added more unit tests to check the behavior of private functions in both qa_accuracy and factual_knowledge. Lastly, I updated the integration tests for factual_knowledge so that they would also check the second metric. I reviewed the code and tried to make sure that I updated documentation of functions/files so that they would incorporate both metrics.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

oyangz commented 2 months ago

Thanks for adding the score descriptions, could you update this list for reporting to use the new factual knowledge score names as well?