explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
5.79k stars 545 forks source link

How to save reason of each evaluation metrics? #992

Open JinSeoung-Oh opened 1 month ago

JinSeoung-Oh commented 1 month ago

Hi, at first, I want to say thanks this wonderful work.

Actually, I want save the reason of evaluation. I mean, line 69 to line 106 at https://github.com/explodinggradients/ragas/blob/main/src/ragas/metrics/_answer_correctness.py Actually, it already has save function and it seems save line 69~106. Ah, of course it is example, but I think this module generate something like this line. I just want to save it

But I tried use this save function, it returned 'None'

So, how can I save this? I have to build new module for this?

Thanks!

jjmachan commented 1 month ago

hey @JinSeoung-Oh thank you for raising this issue - that is indeed a good suggestion!

can I ask you a couple of questions to better understand this?

  1. where and how would you like to save this? Would just text be enough? are you expecting it to be part of the Result object?
  2. are you using any Tracing tools? If you are, the reasoning is actually saved for you automatically today
  3. how will you visualize the saved data, there will be a lot of data points to go through.
kengelbrecht commented 2 weeks ago

@jjmachan I have been looking for such a feature as well (for faithfulness metric, though). I would like to see the statements and respective verdict for two reasons:

  1. I would like to roughly evaluate if the numbers returned by the evaluate function are based on reasonable extracted statements and verdicts.
  2. I am considering to add those details to a labeling task such that annotators get hints as to possible errors.

I reckoned the information would be available if I use fallbacks, but this is a fairly complex setup especially given the early stage of my investigations. Referring to the getting started example, if the information could be returned as part of the result of the evaluate function, so result.to_pandas() then would add it to the dataframe, that would be a handy solution for me.