confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
3.85k stars 309 forks source link

Knowledge retention metric does not work #979

Open domciakocan opened 3 months ago

domciakocan commented 3 months ago

Describe the bug Running tests for Knowledge Retention (following the documentation: [https://docs.confident-ai.com/docs/metrics-knowledge-retention]) generates error: TypeError: Claude.generate() missing 1 required positional argument: 'schema'.

To Reproduce Steps to reproduce the behavior:

from deepeval.metrics import KnowledgeRetentionMetric
from deepeval.test_case.conversational_test_case import ConversationalTestCase, Message
from deepeval.test_case.llm_test_case import LLMTestCase
from deepeval.models.base_model import DeepEvalBaseLLM
from pydantic import BaseModel
import json
import os
import sys
import tempfile
from typing import Any, Dict, List, Optional

from anthropic import AnthropicBedrock

class Claude(DeepEvalBaseLLM):
    """Claude model."""

    # pylint: disable=arguments-differ

    model_id = MODEL_ID

    def load_model(self):
        """Load the model."""
        return AnthropicBedrock(aws_region="us-east-1")

    def generate(self, prompt: str, schema: BaseModel) -> BaseModel:
        """Generate a response."""
        client = self.load_model()
        instructor_client = instructor.from_anthropic(client)
        return instructor_client.messages.create(
            model=self.model_id,
            max_tokens=4096,
            temperature=0,  # temperature 0 for tests
            messages=[{"role": "user", "content": prompt}],
            response_model=schema,
        )

    async def a_generate(self, prompt: str, schema: BaseModel) -> BaseModel:
        """Generate a response asynchronously."""
        return self.generate(prompt, schema)

    def get_model_name(self) -> str:
        """Get the model name."""
        return self.model_id

test_case = ConversationalTestCase(
    messages=[
        Message(
            llm_test_case=LLMTestCase(
                input="What is this document about?",
                actual_output="This document is about koalas.",
                expected_output=None,
                context=None,
                retrieval_context=None,
                additional_metadata=None,
                comments=None,
                tools_used=None,
                expected_tools=None,
                reasoning=None,
            ),
            should_evaluate=False,
        ),
        Message(
            llm_test_case=LLMTestCase(
                input="What are the koalas doing?",
                actual_output="The koalas are climbing trees.",
                expected_output=None,
                context=None,
                retrieval_context=None,
                additional_metadata=None,
                comments=None,
                tools_used=None,
                expected_tools=None,
                reasoning=None,
            ),
            should_evaluate=True,
        ),
    ],
    additional_metadata=None,
    comments=None,
    evaluate_all_messages=False,
)

metric = KnowledgeRetentionMetric(
    model=Claude(), 
    threshold=0.5,
)
metric.measure(test_case)

print(metric.score)

Expected behavior Code should result in a score for Knowledge Retention metric.

Screenshots image

Additional context Other metrics such as hallucination, bias etc. are working properly.

StrikeNP commented 2 months ago

I received a similar error when running ContexturalPrecisionMetric, and it appears there was another error that lead to this one:

File ~/anaconda3/envs/llamaindex/lib/python3.12/site-packages/deepeval/metrics/contextual_precision/contextual_precision.py:189, in ContextualPrecisionMetric._a_generate_verdicts(self, input, expected_output, retrieval_context)
    188 try:
--> 189     res: Verdicts = await self.model.a_generate(
    190         prompt, schema=Verdicts
    191     )
    192     verdicts = [item for item in res.verdicts]

TypeError: object Verdicts can't be used in 'await' expression

Do you see a similar error at the top of your stacktrace?

Update: Nevermind, this was my own doing. I never added async to the a_generate signature.