Open cpolcino opened 3 months ago
Hey @cpolcino when does it show none? it shows the score is 1 based on what you pasted
thank you for the answer @penguine-ip , i think that the score=1 is in the debug phase,if you look to the last two lines of codes you see that the Answer relevancy returned None in the main. Do you agree with me?
Describe the bug i'm working locally with ollama, i want to evaluate my model but the return score is always None, the strangest part is that in the debutting part it gives me a number of relevancy To Reproduce
Steps to reproduce the behavior:
import requests import logging import traceback from deepeval.models import DeepEvalBaseLLM from deepeval.metrics import AnswerRelevancyMetric from deepeval.test_case import LLMTestCase
logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger(name)
class CustomLlama3_1(DeepEvalBaseLLM): def init(self): self.model_name = "llama3.1:latest" self.api_url = "http://localhost:11434/api/generate"
def test_answer_relevancy(): custom_llm = CustomLlama3_1() answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5, model=custom_llm)
if name == "main": test_answer_relevancy()
Expected behavior i would like to have a number, a score as output and not None
Screenshots Output screen:
INFO:main:Measuring answer relevancy...
Event loop is already running. Applying nest_asyncio patch to allow async execution...
DEBUG:main:Generating response for prompt: Given the text, breakdown and generate a list of statements presented. Ambiguous statements and single words can also be considered as statements.
Example: Example text: Shoes. The shoes can be refunded at no extra cost. Thanks for asking the question!
{ "statements": ["Shoes.", "Shoes can be refunded at no extra cost", "Thanks for asking the question!"] } ===== END OF EXAMPLE ======
IMPORTANT: Please make sure to only return in JSON format, with the "statements" key mapping to a list of strings. No words or explanation is needed.
Text: The capital of France is Paris.
JSON:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:11434 DEBUG:urllib3.connectionpool:http://localhost:11434 "POST /api/generate HTTP/1.1" 200 1098 DEBUG:main:Generated response: { "statements": ["The capital of France", "is Paris."] } DEBUG:main:Generating response for prompt: For the provided list of statements, determine whether each statement is relevant to address the input. Please generate a list of JSON with two keys:
verdict
andreason
. The 'verdict' key should STRICTLY be either a 'yes', 'idk' or 'no'. Answer 'yes' if the statement is relevant to addressing the original input, 'no' if the statement is irrelevant, and 'idk' if it is ambiguous (eg., not directly relevant but could be used as a supporting point to address the input). The 'reason' is the reason for the verdict. Provide a 'reason' ONLY if the answer is 'no'. The provided statements are statements made in the actual output.** IMPORTANT: Please make sure to only return in JSON format, with the 'verdicts' key mapping to a list of JSON objects. Example input: What should I do if there is an earthquake? Example statements: ["Shoes.", "Thanks for asking the question!", "Is there anything else I can help you with?", "Duck and hide"] Example JSON: { "verdicts": [ { "verdict": "no", "reason": "The 'Shoes.' statement made in the actual output is completely irrelevant to the input, which asks about what to do in the event of an earthquake." }, { "verdict": "idk" }, { "verdict": "idk" }, { "verdict": "yes" } ]
}
Since you are going to generate a verdict for each statement, the number of 'verdicts' SHOULD BE STRICTLY EQUAL to the number of
statements
. **Input: What is the capital of France?
Statements: ['The capital of France', 'is Paris.']
JSON:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:11434 DEBUG:urllib3.connectionpool:http://localhost:11434/ "POST /api/generate HTTP/1.1" 200 None DEBUG:main:Generated response: Here is the JSON output based on the input and statements provided:
{ "verdicts": [ { "verdict": "idk", "reason": "" }, { "verdict": "yes" } ] }
Explanation for each verdict:
IMPORTANT: Please make sure to only return in JSON format, with the 'reason' key providing the reason. Example JSON: { "reason": "The score is because ."
}
Answer Relevancy Score: 1.00
Reasons why the score can't be higher based on irrelevant statements in the actual output: []
Input: What is the capital of France?
JSON:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:11434 DEBUG:urllib3.connectionpool:http://localhost:11434/ "POST /api/generate HTTP/1.1" 200 1596 DEBUG:main:Generated response: { "reason": "The score is 1.00 because all relevant information is present and there are no irrelevant statements made in response to your question about the capital of France." }
INFO:main:Answer relevancy score: None ERROR:main:AnswerRelevancyMetric returned None. This might indicate an internal error.
Desktop (please complete the following information):
linux python notebook
Additional context Add any other context about the problem here.