Closed landhu closed 1 week ago
Hi @landhu, the high values for answer_correctness and answer_similarity make sense because of how they are calculated:
So, if the AI’s response seems similar to the correct answer in meaning, it will score high, even if the details (like numbers) are wrong.
In your case, it would make sense to design a custom evaluation approach that focuses more on matching specific parts of the response—like titles, text, and values—rather than relying solely on general metrics like answer correctness or answer similarity. Here's how you might approach it:
@sahusiddharth thanks very much. But I have two more questions:
for numeric data, so I need to modify ragas's answer correctness prompts/INSTRUCTIONS, right?
Yes, You can change the correctness prompt, but I am not sure if llm and embeddings would be accurately able to capture the exact value.
I was thinking more in the lines of element-wise dictionary matching.
Something like this: -
response = { "data": [ 915, 938, 853 ]}
ground_truth = { "data": [ 915, 938, 823 ]}
binary_comparision = ground_truth == response
# 0
response = { "data": [ 915, 938, 853 ]}
ground_truth = { "data": [ 915, 938, 823 ]}
accurate_capture = 0
total_elements = len(ground_truth["data"])
# Element-wise comparison
for i in range(total_elements):
if ground_truth["data"][i] == response["data"][i]:
accurate_capture += 1
# Calculate the accuracy percentage
accuracy_percentage = (accurate_capture / total_elements) * 100
# 0.66
It seems the issue was answered, closing this now.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
How to evaluate the json data 1.my rag system support chart 2.api response like below and UI can convert a chart
{ "id": "4db70ba6-2085-45ac-aefb-8aa211961806", "answer": { "chart": { "chart": { "type": "column" }, "series": [ { "data": [ 915, 938, 853 ], "name": "Offline Devices" } ], "title": { "text": "Offline Devices by Type" }, "xAxis": { "categories": [ "DELL", "HP", "Others" ], "title": { "text": "Type" } }, "yAxis": { "min": 0, "title": { "text": "Number of Offline Devices" } } } }, "content": [], "timestamp": "2024-11-09T00:35:54.389+00:00", "type": [ "chart" ] }
Additional context I attempt get the answer json and then dumps to a str then evaluate, But even if the numbers are wrong, the results(answer_correctness answer_similarity) are still high.
Please help. Very Urgent