Open PierreMesure opened 2 days ago
I'm experiencing the same issue, and I believe this is a quite critical bug. I couldn't run evaluations with additional metrics. Any update on this?
Hey @PierreMesure and @Snow31ind - This should be solved by https://github.com/Giskard-AI/giskard/pull/2052
Can you try again with Giskard latest release?
@alexcombessie Thank for replying to us. Let me input more context on this issue. The version of giskard and ragas in my requirements.txt
file is:
giskard==2.15.3
ragas==0.2.2
I believe 2.15.3
is the latest release, and I still see the same error thrown as above.
After inspecting the stack traceback, I'm wondering if the ragas sample in the giskard ragas metric wrapper matches the required interface in the base ragas metric score method, as the sample doesn't contain the user_input
key. That's why I strongly believe it's the root cause.
Could you help double check on that? And is there any tests being run to make sure there's no data interface mismatch?
I just reverted ragas to 0.1.21 and it works. 😊
@alexcombessie, I reported another problem fixed by #2052, I don't think this PR will fix it. I think the problem stemmed from a change in the name of the parameters by RAGAS. In this commit, you can see the change in the documentation. I think the change in variable names comes from this PR
@PierreMesure Awesome! You make my day. Anyway, this issue is worth having a fixed soon. Thanks team!
Issue Type
Bug
Source
source
Giskard Library Version
2.15.3
OS Platform and Distribution
No response
Python version
No response
Installed python packages
Current Behaviour?
When trying to evaluate a RAG assistant with some RAGAS metrics (context recall), the evaluation fails. See stacktrace below. This happens when trying to provide the answer as
AgentAnswer
. We're not super clear about what should be in thedocuments
parameter, the documentation doesn't give any clear example. We're using LlamaIndex soagent_output.source_nodes
doesn't return a list of strings. Here's what we've tried:Standalone code OR list down the steps to reproduce the issue
Relevant log output