Closed xiaoyi-cheng closed 5 months ago
@polaschwoebel :
The idea behind these is to group new metrics that taking the same input signature for evaluate_sample
as one new algo. qa_accuracy
metrics are calculated based on target_output
and model_output
, in that algo model_input
is not needed. Adding this metric to qa_accuracy
will need to modify the input signature of it.
@polaschwoebel : The idea behind these is to group new metrics that taking the same input signature for
evaluate_sample
as one new algo.qa_accuracy
metrics are calculated based ontarget_output
andmodel_output
, in that algomodel_input
is not needed. Adding this metric toqa_accuracy
will need to modify the input signature of it.
Thanks for the context, @xiaoyi-cheng! We should do two things then:
evaluate_sample
to take additional inputs where needed.metrics.py
, or multiple files) and be importable into the different evaluations as needed. This way they become more modular and reusable. In particular, we should be able to import the same metric into multiple evals.I have written down how RAG(AS) and existing evaluations relate in this doc, specifically here for QA accuracy. Let's discuss with the wider team as needed.
Issue #, if available: Add
answer relevance
metric under theAnswerRelevance
evaluation algorithm.Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.