Open makoto-velux opened 5 months ago
Additional insights from myself. The issue seems to lie in _generate_statements
method which is called within the measure
method. More specifically, it's due to the way the res
(prediction result) generated by the evaluation model is passed to utils.trimToJson
, which converts the text into json compatible format. Since the res
is of type langchain_core.messages.ai.AIMessage
not a string, it throws the error. As an temporary solution, I modified the _generate_statements
method like below and it works. Same for _generate_verdicts
.
def _generate_statements(
self,
actual_output: str,
) -> List[str]:
prompt = AnswerRelevancyTemplate.generate_statements(
actual_output=actual_output,
)
res = self.model(prompt).content # adding .content to convert the response to string. otherwise res is a type langchain_core.messages.ai.AIMessage and it throws an error.
json_output = trimToJson(res)
data = json.loads(json_output)
return data["statements"]
But I would still be interested to know more stable solution. Also, the evaluate([test_case], [answer_relevancy_metric])
still doesn't work.
@makoto-velux Hey! Thanks for the comment, I think the documentation has a small error (the _call method isn't returning a string, which is why the error and why you have to add .content at the end of self.model(prompt))
I'm reproducing it, will make release in a few hours today.
Apart from that error, what do you mean the evaluate([test_case], [answer_relevancy_metric])
doesn't work?
btw @makoto-velux fixed the docs, one line of update but now it works: https://docs.confident-ai.com/docs/metrics-introduction#azure-openai-example
@penguine-ip thanks for picking this up. Ah so adding .content
to the
_callmethod was enough. Good to know, now I can continue my work! for the
evaluate` method, as you can see in the 2nd screenshot in my 1st post, the Jupyter notebook does not show any result nor error. Seems like it fails(because I can see the red mark next to the code box), but I don't see any error message. It seems like the notebook is trying to render something but it ends up rendering just an empty white horizontal bar.
@makoto-velux I followed our quick start in colab https://colab.research.google.com/drive/1PPxYEBa6eu__LquGoFFJZkhYgWVYE6kh?usp=sharing up until the "🚀 Run Your First Evaluation" section and couldn't reproduce the error. Which metric are you using?
Also if you're on discord we have an issues channel: https://discord.com/invite/a3K9c8GRGt
I was using answer relevancy, context relevancy and faithfulness metrics. And evaluate
didn't work for either of them.
Can you try modifying the colab instead and let me know how it works out? It shouldn't be a problem with your test case since its from the docs
@makoto-velux is it still a problem?
Describe the bug metric.measure method with a custom evaluation LLM returns an attribute error "'AIMessage' object has no attribute 'find'"
To Reproduce Steps to reproduce the behavior:
DeepEvalBaseLLM
class, following this exampleanswer_relevancy_metric = AnswerRelevancyMetric( threshold=0.7, model=azure_openai, include_reason=True )
actual_output = "We offer a 30-day full refund at no extra cost."
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]
test_case = LLMTestCase( input="What if these shoes don't fit?", actual_output=actual_output, retrieval_context=retrieval_context ) answer_relevancy_metric.measure(test_case)
AttributeError Traceback (most recent call last) Cell In[43], line 1 ----> 1 answer_relevancy_metric.measure(test_case)
File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/metrics/answer_relevancy.py:46, in AnswerRelevancyMetric.measure(self, test_case) 41 raise ValueError( 42 "Input, actual output, or retrieval context cannot be None" 43 ) 44 with metrics_progress_context(self.name, self.evaluation_model): 45 # generate statements ---> 46 self.statements: List[str] = self._generate_statements( 47 test_case.actual_output 48 ) 50 # generate verdicts based on statements, and retrieval context 51 self.verdicts: List[AnswerRelvancyVerdict] = ( 52 self._generate_verdicts( 53 test_case.input, test_case.retrieval_context 54 ) 55 )
File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/metrics/answer_relevancy.py:120, in AnswerRelevancyMetric._generate_statements(self, actual_output) 115 prompt = AnswerRelevancyTemplate.generate_statements( 116 actual_output=actual_output, 117 ) 119 res = self.model(prompt) --> 120 json_output = trimToJson(res) 121 data = json.loads(json_output) 122 return data["statements"]
File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/utils.py:92, in trimToJson(input_string) 91 def trimToJson(input_string: str) -> str: ---> 92 start = input_string.find("{") 93 end = input_string.rfind("}") + 1 94 return input_string[start:end] if start != -1 and end != 0 else ""
AttributeError: 'AIMessage' object has no attribute 'find'