explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.3k stars 746 forks source link

Explicit the "Failed to parse output. Returning None." of RagasoutputParser or add a way to catch it. #1681

Open Gwenn-LR opened 1 week ago

Gwenn-LR commented 1 week ago

Describe the Feature Hi @shahules786 @jjmachan, I'm back in Ragas business ^^

I've recently stumble upon a Failed to parse output. Returning None. while trying to evaluate the faithfulness of answers generated from a model based on a generated dataset.

As explained in the warning, it seems to come from a ill structured output. After looking in your repository, I've noticed many issues related to this warning and its consequences.

Some users have found different solutions as changing the evaluation prompt (#1150) or the context/output max token (#1120) but for me it was thanks to @jjmachan answer that I've managed to correctly evaluate my rag by changing the llm metric from llama3.1:8b to llama3.2:3b (I use LangChain wrapper to wrap a model provided by Ollama with ChatOllama).

However, there still are many opened issues concerning this same warning (#1630, #1545, #1358, #1274, #1228, #1206, #1186, #1150, #1120, #966, #957, #955, #859)

That's why I was wondering if you could explicit a bit more this warning to guide users along some way to improve their evaluation method or change the warning into an exception so we can raise it if needed and explicit the issue ourselves.

Why is the feature important for you? So others won't spend so much time to debug their evaluation step and avoid adding useless issues to your repository ^^

In any case, thanks again for your amazing work, keep it up as long as possible ! Have a nice day :)

amin-kh96 commented 23 hours ago

Describe the Feature Hi @shahules786 @jjmachan, I'm back in Ragas business ^^

I've recently stumble upon a Failed to parse output. Returning None. while trying to evaluate the faithfulness of answers generated from a model based on a generated dataset.

As explained in the warning, it seems to come from a ill structured output. After looking in your repository, I've noticed many issues related to this warning and its consequences.

Some users have found different solutions as changing the evaluation prompt (#1150) or the context/output max token (#1120) but for me it was thanks to @jjmachan answer that I've managed to correctly evaluate my rag by changing the llm metric from llama3.1:8b to llama3.2:3b (I use LangChain wrapper to wrap a model provided by Ollama with ChatOllama).

However, there still are many opened issues concerning this same warning (#1630, #1545, #1358, #1274, #1228, #1206, #1186, #1150, #1120, #966, #957, #955, #859)

That's why I was wondering if you could explicit a bit more this warning to guide users along some way to improve their evaluation method or change the warning into an exception so we can raise it if needed and explicit the issue ourselves.

Why is the feature important for you? So others won't spend so much time to debug their evaluation step and avoid adding useless issues to your repository ^^

In any case, thanks again for your amazing work, keep it up as long as possible ! Have a nice day :)

i still have this issue. also the thing is that I already have the embeddings and textual data. so I created the subclass to bypass the API error call. but still I see the message. I think the reason that cause fail to parse the output is that we use the "prompt value" in the subclass and while we feed our own textual data it can not match and we see the message.