confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
2.42k stars 168 forks source link

AttributeError: 'AIMessage' object has no attribute 'find' #482

Open makoto-velux opened 5 months ago

makoto-velux commented 5 months ago

Describe the bug metric.measure method with a custom evaluation LLM returns an attribute error "'AIMessage' object has no attribute 'find'"

To Reproduce Steps to reproduce the behavior:

  1. I defined a custom evaluation model by inheriting DeepEvalBaseLLM class, following this example
from deepeval.models.base import DeepEvalBaseLLM
class DeepEvalAzureOpenAI(DeepEvalBaseLLM):
    def __init__(
        self,
        model
    ):
        self.model = model

    def load_model(self):
        return self.model

    def _call(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt)

    def get_model_name(self):
        return "Custom Azure OpenAI Model"
  1. In my main script, I imported the custom eval model, constructed metrics, and used them
    
    from langchain.chat_models import AzureChatOpenAI
    from DeepEvalAzure import DeepEvalAzureOpenAI
    from deepeval.metrics import AnswerRelevancyMetric, ContextualRelevancyMetric
    custom_model = AzureChatOpenAI(
    azure_deployment=os.getenv("OPENAI_API_MODEL_GPT4"),
    openai_api_version=os.getenv("OPENAI_API_VERSION"),
    base_url = os.getenv("OPENAI_API_BASE"),
    )
    azure_openai = DeepEvalAzureOpenAI(model=custom_model)

answer_relevancy_metric = AnswerRelevancyMetric( threshold=0.7, model=azure_openai, include_reason=True )

actual_output = "We offer a 30-day full refund at no extra cost."

retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]

test_case = LLMTestCase( input="What if these shoes don't fit?", actual_output=actual_output, retrieval_context=retrieval_context ) answer_relevancy_metric.measure(test_case)

3. Observed this error:

AttributeError Traceback (most recent call last) Cell In[43], line 1 ----> 1 answer_relevancy_metric.measure(test_case)

File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/metrics/answer_relevancy.py:46, in AnswerRelevancyMetric.measure(self, test_case) 41 raise ValueError( 42 "Input, actual output, or retrieval context cannot be None" 43 ) 44 with metrics_progress_context(self.name, self.evaluation_model): 45 # generate statements ---> 46 self.statements: List[str] = self._generate_statements( 47 test_case.actual_output 48 ) 50 # generate verdicts based on statements, and retrieval context 51 self.verdicts: List[AnswerRelvancyVerdict] = ( 52 self._generate_verdicts( 53 test_case.input, test_case.retrieval_context 54 ) 55 )

File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/metrics/answer_relevancy.py:120, in AnswerRelevancyMetric._generate_statements(self, actual_output) 115 prompt = AnswerRelevancyTemplate.generate_statements( 116 actual_output=actual_output, 117 ) 119 res = self.model(prompt) --> 120 json_output = trimToJson(res) 121 data = json.loads(json_output) 122 return data["statements"]

File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/utils.py:92, in trimToJson(input_string) 91 def trimToJson(input_string: str) -> str: ---> 92 start = input_string.find("{") 93 end = input_string.rfind("}") + 1 94 return input_string[start:end] if start != -1 and end != 0 else ""

AttributeError: 'AIMessage' object has no attribute 'find'



**Screenshots**
Also, `evaluate([test_case], [answer_relevancy_metric])` fails without any error messages. On my Jupyter notebook, it flicks between "You're using DeepEval's latest Answer Relevancy Metric..." and "Evaluating testcases..." and after few seconds, it fails without any output. 
<img width="985" alt="Screenshot 2024-02-16 at 00 38 45" src="https://github.com/confident-ai/deepeval/assets/158270464/074a8da9-d51a-42c1-9a3e-11021da5362d">
<img width="629" alt="Screenshot 2024-02-16 at 00 47 58" src="https://github.com/confident-ai/deepeval/assets/158270464/2cdb960a-c741-427a-b286-e27278be2a48">

**Desktop (please complete the following information):**
 - OS: MacOS 14.1.2 (23B92)
 - IDE: VSCode 1.86.2
 - Python: 3.11.5
- langchain: 0.1.7
- langchain-openai: 0.0.6
- deepeval: 0.20.65
makoto-velux commented 5 months ago

Additional insights from myself. The issue seems to lie in _generate_statements method which is called within the measure method. More specifically, it's due to the way the res (prediction result) generated by the evaluation model is passed to utils.trimToJson, which converts the text into json compatible format. Since the res is of type langchain_core.messages.ai.AIMessage not a string, it throws the error. As an temporary solution, I modified the _generate_statements method like below and it works. Same for _generate_verdicts.

def _generate_statements(
    self,
    actual_output: str,
) -> List[str]:
    prompt = AnswerRelevancyTemplate.generate_statements(
        actual_output=actual_output,
    )

    res = self.model(prompt).content # adding .content to convert the response to string. otherwise res is a type langchain_core.messages.ai.AIMessage and it throws an error.
    json_output = trimToJson(res)
    data = json.loads(json_output)
    return data["statements"]

But I would still be interested to know more stable solution. Also, the evaluate([test_case], [answer_relevancy_metric]) still doesn't work.

penguine-ip commented 4 months ago

@makoto-velux Hey! Thanks for the comment, I think the documentation has a small error (the _call method isn't returning a string, which is why the error and why you have to add .content at the end of self.model(prompt))

I'm reproducing it, will make release in a few hours today.

Apart from that error, what do you mean the evaluate([test_case], [answer_relevancy_metric]) doesn't work?

penguine-ip commented 4 months ago

btw @makoto-velux fixed the docs, one line of update but now it works: https://docs.confident-ai.com/docs/metrics-introduction#azure-openai-example

makoto-velux commented 4 months ago

@penguine-ip thanks for picking this up. Ah so adding .content to the_callmethod was enough. Good to know, now I can continue my work! for theevaluate` method, as you can see in the 2nd screenshot in my 1st post, the Jupyter notebook does not show any result nor error. Seems like it fails(because I can see the red mark next to the code box), but I don't see any error message. It seems like the notebook is trying to render something but it ends up rendering just an empty white horizontal bar.

penguine-ip commented 4 months ago

@makoto-velux I followed our quick start in colab https://colab.research.google.com/drive/1PPxYEBa6eu__LquGoFFJZkhYgWVYE6kh?usp=sharing up until the "🚀 Run Your First Evaluation" section and couldn't reproduce the error. Which metric are you using?

penguine-ip commented 4 months ago

Also if you're on discord we have an issues channel: https://discord.com/invite/a3K9c8GRGt

makoto-velux commented 4 months ago

I was using answer relevancy, context relevancy and faithfulness metrics. And evaluate didn't work for either of them.

penguine-ip commented 4 months ago

Can you try modifying the colab instead and let me know how it works out? It shouldn't be a problem with your test case since its from the docs

penguine-ip commented 4 months ago

@makoto-velux is it still a problem?