gomate-community / rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.
Apache License 2.0
119 stars 9 forks source link

Define output format of evaluate function #52

Closed Wenshansilvia closed 7 months ago

Wenshansilvia commented 7 months ago

During evaluation of all metrics, detail log info should be written to local file using logger. Two variable should be returned by the evaluate function:

  1. average score of each metric
  2. instance level score of each metric

    >>> import rageval as rl
    >>> ds = load_dataset("testset_name")
    >>> result, instance_level_result = evaluate(
                ds.select(range(3)),
                metrics=[ContextRecall(), AnswerGroundedness()],
                models = [cr_model, ag_model]
            )
    >>> result
    Dataset({
        features: ['context_recall', 'answer_groundedness'],
        num_rows: 2
     })
    >>> instance_level_result
    Dataset({
        features: ['questions', 'gt_answers', 'answers', 'contexts', 'context_recall', 'answer_groundedness'],
        num_rows: 9
    }))
faneshion commented 7 months ago

Maybe dict is enough for result, hahh...