feat: make general purpose metrics more general

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

Apache License 2.0

7.33k stars 746 forks source link

Metrics Converted

[x] Aspect Critic
[x] Simple Criteria
[x] Rubric Based - both Instance and Domain specific

a few different examples

Aspect Critic

from ragas.metrics import AspectCritic
from ragas.dataset_schema import SingleTurnSample

only_response = SingleTurnSample(
    response="The Eiffel Tower is located in Paris."
)

grammar_critic = AspectCritic(
    name="grammar",
    definition="Is the response grammatically correct?",
    llm=evaluator_llm
)

await grammar_critic.single_turn_ascore(only_response)

with reference

answer_correctness_critic = AspectCritic(
    name="answer_correctness",
    definition="Is the response and reference answer are the same?",
    llm=evaluator_llm
)

# data row
sample = SingleTurnSample(
    user_input="Where is the Eiffel Tower located?",
    response="The Eiffel Tower is located in Paris.",
    reference="London"
)
await answer_correctness_critic.single_turn_ascore(sample)

Note: this only works for multi-turn metrics for now

explodinggradients / ragas

feat: make general purpose metrics more general #1666

Metrics Converted

Aspect Critic