aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
214 stars 46 forks source link

feat: add answer relevance algo #295

Closed xiaoyi-cheng closed 5 months ago

xiaoyi-cheng commented 5 months ago

Issue #, if available: Add answer relevance metric under the AnswerRelevance evaluation algorithm.

Description of changes:

  1. Default dataset unclear.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

xiaoyi-cheng commented 5 months ago

@polaschwoebel : The idea behind these is to group new metrics that taking the same input signature for evaluate_sample as one new algo. qa_accuracy metrics are calculated based on target_output and model_output, in that algo model_input is not needed. Adding this metric to qa_accuracy will need to modify the input signature of it.

polaschwoebel commented 5 months ago

@polaschwoebel : The idea behind these is to group new metrics that taking the same input signature for evaluate_sample as one new algo. qa_accuracy metrics are calculated based on target_output and model_output, in that algo model_input is not needed. Adding this metric to qa_accuracy will need to modify the input signature of it.

Thanks for the context, @xiaoyi-cheng! We should do two things then:

  1. Refactor evaluate_sample to take additional inputs where needed.
  2. Metrics should live in a separate place (like metrics.py, or multiple files) and be importable into the different evaluations as needed. This way they become more modular and reusable. In particular, we should be able to import the same metric into multiple evals.

I have written down how RAG(AS) and existing evaluations relate in this doc, specifically here for QA accuracy. Let's discuss with the wider team as needed.