Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for ML & LLM systems
https://docs.giskard.ai
Apache License 2.0
4.04k stars 261 forks source link

KeyError: 'user_input' when calculating RAGAS metric #2056

Open PierreMesure opened 2 days ago

PierreMesure commented 2 days ago

Issue Type

Bug

Source

source

Giskard Library Version

2.15.3

OS Platform and Distribution

No response

Python version

No response

Installed python packages

ragas==0.2.2

Current Behaviour?

When trying to evaluate a RAG assistant with some RAGAS metrics (context recall), the evaluation fails. See stacktrace below. This happens when trying to provide the answer as AgentAnswer. We're not super clear about what should be in the documents parameter, the documentation doesn't give any clear example. We're using LlamaIndex so agent_output.source_nodes doesn't return a list of strings. Here's what we've tried:

Standalone code OR list down the steps to reproduce the issue

def answer_fn(question: str, history: List[dict] = []) -> AgentAnswer:

    chat_history = [ChatMessage(role=MessageRole.USER, content=msg["content"]) for msg in history]
    agent_output = chat_engine.chat(question, chat_history=chat_history)

    answer = agent_output.response
    # documents = agent_output.source_nodes
    documents = [node.text for node in agent_output.source_nodes if hasattr(node.node, 'text')]

    return AgentAnswer(message=answer,documents=documents)
evaluate(
            answer_fn,
            testset=testset,
            knowledge_base=knowledge_base,
            metrics=[ragas_context_recall, ragas_context_precision, ragas_faithfulness, ragas_answer_relevancy]
        )

Relevant log output

  File "/evaluation_manager.py", line 310, in _run_giskard_evaluation_and_return_generated_report
    return evaluate(
           ^^^^^^^^^
  File "/giskard/rag/evaluate.py", line 105, in evaluate
    metrics_results[sample["id"]].update(metric(sample, answer))
                                         ^^^^^^^^^^^^^^^^^^^^^^
  File "/giskard/rag/metrics/ragas_metrics.py", line 119, in __call__
    return {self.name: self.metric.score(ragas_sample)}
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragas/utils.py", line 159, in emit_warning
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/ragas/metrics/base.py", line 121, in score
    raise e
  File "/ragas/metrics/base.py", line 117, in score
    score = loop.run_until_complete(self._ascore(row=row, callbacks=group_cm))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/asyncio/tasks.py", line 314, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/ragas/metrics/_context_recall.py", line 191, in _ascore
    return await super()._ascore(row, callbacks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragas/metrics/_context_recall.py", line 156, in _ascore
    question=row["user_input"],
             ~~~^^^^^^^^^^^^^^
KeyError: 'user_input'
Snow31ind commented 1 day ago

I'm experiencing the same issue, and I believe this is a quite critical bug. I couldn't run evaluations with additional metrics. Any update on this?

alexcombessie commented 1 day ago

Hey @PierreMesure and @Snow31ind - This should be solved by https://github.com/Giskard-AI/giskard/pull/2052

Can you try again with Giskard latest release?

Snow31ind commented 1 day ago

@alexcombessie Thank for replying to us. Let me input more context on this issue. The version of giskard and ragas in my requirements.txt file is:

giskard==2.15.3
ragas==0.2.2

I believe 2.15.3 is the latest release, and I still see the same error thrown as above.

After inspecting the stack traceback, I'm wondering if the ragas sample in the giskard ragas metric wrapper matches the required interface in the base ragas metric score method, as the sample doesn't contain the user_input key. That's why I strongly believe it's the root cause.

Could you help double check on that? And is there any tests being run to make sure there's no data interface mismatch?

PierreMesure commented 1 day ago

I just reverted ragas to 0.1.21 and it works. 😊

@alexcombessie, I reported another problem fixed by #2052, I don't think this PR will fix it. I think the problem stemmed from a change in the name of the parameters by RAGAS. In this commit, you can see the change in the documentation. I think the change in variable names comes from this PR

Snow31ind commented 1 day ago

@PierreMesure Awesome! You make my day. Anyway, this issue is worth having a fixed soon. Thanks team!