confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
2.31k stars 162 forks source link

Add Support for `Gemini` Models #829

Open CD-rajveer opened 2 weeks ago

CD-rajveer commented 2 weeks ago

❗BEFORE YOU BEGIN❗ Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt

Is your feature request related to a problem? Please describe. Currently DeepEval only supports testing of responses generated by openai models. I am using gemini models and I want to test the respones for gemini models.

Describe the solution you'd like Add Support for gemini models in path deepeval.models import gemini_models Adding a code file gemini_models.py on path deepeval/models/gemini_models.py

or adding support for gemini models in path deepeval/models/gpt_model.py

Describe alternatives you've considered Currently there is no alternatives I could think of.

Additional context No additional Context

penguine-ip commented 2 weeks ago

@CD-rajveer There is already support for gemini models, or in fact any LLM you wish to use: https://docs.confident-ai.com/docs/metrics-introduction#google-vertexai-example

CD-rajveer commented 2 weeks ago

yes I read that documentation.

and defined my custom llm here


from deepeval.models.base_model import DeepEvalBaseLLM

class TestGoogleGenerativeAI(DeepEvalBaseLLM):
    """Class to implementGoogle Geverative AI for DeepEval"""
    def __init__(self, model):
        self.model = model

    def load_model(self):
        return self.model

    def generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt).content

    async def a_generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        res = await chat_model.ainvoke(prompt)
        return res.content

    def get_model_name(self):
        return "Google Generative AI Model"

safety_settings = {
    HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

custom_model_gemini = ChatGoogleGenerativeAI(
    model="models/gemini-1.5-flash-latest",
    convert_system_message_to_human=True,
    google_api_key=google_api_key,
    temperature=0.3,
    top_k=1,
    top_p=0.9,
    max_output_tokens=8192,
    verbose=True,

   #example : us-central1
)

vertexai_gemini = TestGoogleGenerativeAI(model=custom_model_gemini)
@pytest.mark.parametrize(
    "test_case",
    dataset,
)
@pytest.mark.asyncio
def test_chat_app(test_case: LLMTestCase):
    answer_relevancy_metric = AnswerRelevancyMetric(
        threshold=0.7,
        # model="gpt-4o",
        model=vertexai_gemini,
        include_reason=True,
    )
    assert_test(test_case, [answer_relevancy_metric])

@deepeval.on_test_run_end
def function_to_be_called_after_test_run():
    print("Test finished!")

but got the following error while testing:

venv/lib/python3.10/site-packages/deepeval/evaluate.py:376: in assert_test
    test_result = loop.run_until_complete(
/usr/lib/python3.10/asyncio/base_events.py:649: in run_until_complete
    return future.result()
venv/lib/python3.10/site-packages/deepeval/evaluate.py:302: in a_execute_test_cases
    await measure_metrics_with_indicator(
venv/lib/python3.10/site-packages/deepeval/metrics/indicator.py:150: in measure_metrics_with_indicator
    await asyncio.gather(*tasks)
venv/lib/python3.10/site-packages/deepeval/metrics/indicator.py:89: in measure_metric_task
    await metric.a_measure(tc, _show_indicator=False)
venv/lib/python3.10/site-packages/deepeval/metrics/bias/bias.py:89: in a_measure
    self.opinions: List[str] = await self._a_generate_opinions(
venv/lib/python3.10/site-packages/deepeval/metrics/bias/bias.py:180: in _a_generate_opinions
    res = await self.model.a_generate(prompt)
test_deepeval.py:76: in a_generate
    res = await chat_model.ainvoke(prompt)
venv/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:191: in ainvoke
    llm_result = await self.agenerate_prompt(
venv/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:609: in agenerate_prompt
    return await self.agenerate(
venv/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:569: in agenerate
    raise exceptions[0]
venv/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:754: in _agenerate_with_cache
    result = await self._agenerate(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = ChatGoogleGenerativeAI(verbose=True, model='models/gemini-1.5-flash-latest', google_api_key=SecretStr('**********'), t...ce.client.GenerativeServiceClient object at 0x7154b2e166e0>, default_metadata=(), convert_system_message_to_human=True)
messages = [HumanMessage(content='Based on the given text, please generate a list of OPINIONS. Claims, undisputed truths, are NOT...uery list. This configuration should work for your setup. Let me know if you need any further assistance.\n\nJSON:\n')]
stop = None, run_manager = <langchain_core.callbacks.manager.AsyncCallbackManagerForLLMRun object at 0x7154b0481fc0>, tools = None, functions = None, safety_settings = None, tool_config = None
generation_config = None, kwargs = {}

    async def _agenerate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        *,
        tools: Optional[Sequence[Union[ToolDict, GoogleTool]]] = None,
        functions: Optional[Sequence[FunctionDeclarationType]] = None,
        safety_settings: Optional[SafetySettingDict] = None,
        tool_config: Optional[Union[Dict, _ToolConfigDict]] = None,
        generation_config: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> ChatResult:
        if not self.async_client:
>           raise RuntimeError(
                "Initialize ChatGoogleGenerativeAI with a running event loop "
                "to use async methods."
            )
E           RuntimeError: Initialize ChatGoogleGenerativeAI with a running event loop to use async methods.

venv/lib/python3.10/site-packages/langchain_google_genai/chat_models.py:782: RuntimeError
penguine-ip commented 2 weeks ago

@CD-rajveer Are you running things in a notebook? Also can you come to discord its easier to debug there: https://discord.com/invite/a3K9c8GRGt

CD-rajveer commented 2 weeks ago

@penguine-ip I am running in a python file. I joined discord already.