AttributeError: 'AIMessage' object has no attribute 'find'

makoto-velux commented 5 months ago

Describe the bug metric.measure method with a custom evaluation LLM returns an attribute error "'AIMessage' object has no attribute 'find'"

To Reproduce Steps to reproduce the behavior:

I defined a custom evaluation model by inheriting DeepEvalBaseLLM class, following this example

from deepeval.models.base import DeepEvalBaseLLM
class DeepEvalAzureOpenAI(DeepEvalBaseLLM):
    def __init__(
        self,
        model
    ):
        self.model = model

    def load_model(self):
        return self.model

    def _call(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt)

    def get_model_name(self):
        return "Custom Azure OpenAI Model"

In my main script, I imported the custom eval model, constructed metrics, and used them


from langchain.chat_models import AzureChatOpenAI
from DeepEvalAzure import DeepEvalAzureOpenAI
from deepeval.metrics import AnswerRelevancyMetric, ContextualRelevancyMetric
custom_model = AzureChatOpenAI(
azure_deployment=os.getenv("OPENAI_API_MODEL_GPT4"),
openai_api_version=os.getenv("OPENAI_API_VERSION"),
base_url = os.getenv("OPENAI_API_BASE"),
)
azure_openai = DeepEvalAzureOpenAI(model=custom_model)

answer_relevancy_metric = AnswerRelevancyMetric( threshold=0.7, model=azure_openai, include_reason=True )

actual_output = "We offer a 30-day full refund at no extra cost."

retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]

test_case = LLMTestCase( input="What if these shoes don't fit?", actual_output=actual_output, retrieval_context=retrieval_context ) answer_relevancy_metric.measure(test_case)

3. Observed this error:

AttributeError Traceback (most recent call last) Cell In[43], line 1 ----> 1 answer_relevancy_metric.measure(test_case)

File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/metrics/answer_relevancy.py:46, in AnswerRelevancyMetric.measure(self, test_case) 41 raise ValueError( 42 "Input, actual output, or retrieval context cannot be None" 43 ) 44 with metrics_progress_context(self.name, self.evaluation_model): 45 # generate statements ---> 46 self.statements: List[str] = self._generate_statements( 47 test_case.actual_output 48 ) 50 # generate verdicts based on statements, and retrieval context 51 self.verdicts: List[AnswerRelvancyVerdict] = ( 52 self._generate_verdicts( 53 test_case.input, test_case.retrieval_context 54 ) 55 )

File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/metrics/answer_relevancy.py:120, in AnswerRelevancyMetric._generate_statements(self, actual_output) 115 prompt = AnswerRelevancyTemplate.generate_statements( 116 actual_output=actual_output, 117 ) 119 res = self.model(prompt) --> 120 json_output = trimToJson(res) 121 data = json.loads(json_output) 122 return data["statements"]

File ~/repos//aia-chatgpt/.venv/lib/python3.11/site-packages/deepeval/utils.py:92, in trimToJson(input_string) 91 def trimToJson(input_string: str) -> str: ---> 92 start = input_string.find("{") 93 end = input_string.rfind("}") + 1 94 return input_string[start:end] if start != -1 and end != 0 else ""

AttributeError: 'AIMessage' object has no attribute 'find'



**Screenshots**
Also, `evaluate([test_case], [answer_relevancy_metric])` fails without any error messages. On my Jupyter notebook, it flicks between "You're using DeepEval's latest Answer Relevancy Metric..." and "Evaluating testcases..." and after few seconds, it fails without any output. 
<img width="985" alt="Screenshot 2024-02-16 at 00 38 45" src="https://github.com/confident-ai/deepeval/assets/158270464/074a8da9-d51a-42c1-9a3e-11021da5362d">
<img width="629" alt="Screenshot 2024-02-16 at 00 47 58" src="https://github.com/confident-ai/deepeval/assets/158270464/2cdb960a-c741-427a-b286-e27278be2a48">

**Desktop (please complete the following information):**
 - OS: MacOS 14.1.2 (23B92)
 - IDE: VSCode 1.86.2
 - Python: 3.11.5
- langchain: 0.1.7
- langchain-openai: 0.0.6
- deepeval: 0.20.65

makoto-velux commented 5 months ago

Additional insights from myself. The issue seems to lie in _generate_statements method which is called within the measure method. More specifically, it's due to the way the res (prediction result) generated by the evaluation model is passed to utils.trimToJson, which converts the text into json compatible format. Since the res is of type langchain_core.messages.ai.AIMessage not a string, it throws the error. As an temporary solution, I modified the _generate_statements method like below and it works. Same for _generate_verdicts.

def _generate_statements(
    self,
    actual_output: str,
) -> List[str]:
    prompt = AnswerRelevancyTemplate.generate_statements(
        actual_output=actual_output,
    )

    res = self.model(prompt).content # adding .content to convert the response to string. otherwise res is a type langchain_core.messages.ai.AIMessage and it throws an error.
    json_output = trimToJson(res)
    data = json.loads(json_output)
    return data["statements"]

But I would still be interested to know more stable solution. Also, the evaluate([test_case], [answer_relevancy_metric]) still doesn't work.