confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
3.63k stars 290 forks source link

APIConnectionError: Connection error. #1094

Open paulacanva opened 3 weeks ago

paulacanva commented 3 weeks ago

❗BEFORE YOU BEGIN❗ Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt

Describe the bug No matter what I try, I keep getting APIConnectionError: Connection error. when using deepeval only.

I have a script that first generates predictions, calling OpenAI, for ~400 rows. I use asyncio + semaphores + backoff to handle issues. Everything runs fine and fast.

When I get to the deepeval part, I'm able to process 3 of those rows before getting

Traceback (most recent call last):
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
    yield
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_async/connection.py", line 99, in handle_async_request
    raise exc
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_async/connection.py", line 76, in handle_async_request
    stream = await self._connect(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_async/connection.py", line 122, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_backends/auto.py", line 30, in connect_tcp
    return await self._backend.connect_tcp(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
    with map_exceptions(exc_map):
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectError: All connection attempts failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_client.py", line 1674, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 376, in handle_async_request
    with map_httpcore_exceptions():
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/paulaceccon/work/paulaceccon/design_advice/src/main.py", line 121, in <module>
    cli()
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/work/paulaceccon/design_advice/src/main.py", line 110, in evaluate
    asyncio.run(evaluator.run_evaluation())
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/trace/op.py", line 436, in wrapper
    res, _ = await _execute_call(wrapper, call, *args, **kwargs)  # type: ignore
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/trace/op.py", line 253, in _call_async
    return handle_exception(e)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/trace/op.py", line 251, in _call_async
    res = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/work/paulaceccon/design_advice/src/evaluator.py", line 441, in run_evaluation
    "advice_eval": await self.g_eval_score(
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/work/paulaceccon/design_advice/src/evaluator.py", line 391, in g_eval_score
    await asyncio.gather(metric.a_measure(test_case))
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/deepeval/metrics/g_eval/g_eval.py", line 154, in a_measure
    await self._a_generate_evaluation_steps()
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/deepeval/metrics/g_eval/g_eval.py", line 187, in _a_generate_evaluation_steps
    res, cost = await self.model.a_generate(prompt)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/deepeval/models/gpt_model.py", line 182, in a_generate
    res = await chat_model.ainvoke(prompt)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 305, in ainvoke
    llm_result = await self.agenerate_prompt(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 794, in agenerate_prompt
    return await self.agenerate(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 754, in agenerate
    raise exceptions[0]
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 930, in _agenerate_with_cache
    result = await self._agenerate(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 827, in _agenerate
    response = await self.async_client.create(**payload)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/trace/op.py", line 436, in wrapper
    res, _ = await _execute_call(wrapper, call, *args, **kwargs)  # type: ignore
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/trace/op.py", line 253, in _call_async
    return handle_exception(e)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/trace/op.py", line 251, in _call_async
    res = await func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/weave/integrations/openai/openai_sdk.py", line 288, in _wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1412, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/openai/_base_client.py", line 1829, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/openai/_base_client.py", line 1523, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulaceccon/Library/Caches/pypoetry/virtualenvs/design-advice-vbsnYNek-py3.12/lib/python3.12/site-packages/openai/_base_client.py", line 1596, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

I'm running the deepeval through:

@backoff.on_exception(
        backoff.expo,
        exception=(
                openai.APIConnectionError,
                openai.APIError,
                openai.RateLimitError,
                openai.InternalServerError,
                openai.APITimeoutError,
                ClientError, BotoCoreError, EndpointConnectionError
        ),
        max_tries=5,
        max_time=60,
        factor=2,
    )  # Not sure how to better handle errors with deepeval
    async def g_eval_score(
        self,
        expected: str,
        actual: str,
        metric_name: str,
        criteria: str | None = None,
        evaluation_steps: list[str] | None = None,
        threshold: float | None = None,
        model: str = "gpt-4o",
        verbose_mode: bool = False,
        **kwargs,
    ) -> dict[str, Any]:
        """
        Generic function to measure the G-Eval score for a given metric.

        :param verbose_mode: Whether to print verbose logs.
        :param expected: Ground truth (e.g., principles, advice, category).
        :param actual: Model output (e.g., principles, advice, category).
        :param metric_name: Name of the metric.
        :param threshold: Threshold for the G-Eval metric.
        :param criteria: Criteria for the G-Eval metric.
        :param evaluation_steps: Evaluation steps for the G-Eval metric.
        :param model: Model.value (default: "gpt-4o").
        :return: A dictionary with the metric score and reason.

        Notes:
        - Either criteria or evaluation_steps must be provided.
        - For accurate and valid results, only the parameters that are mentioned in criteria should
            be included as a member of evaluation_params.
        """
        if not criteria and not evaluation_steps:
            raise ValueError("Either criteria or evaluation_steps must be provided.")

        evaluation_params = kwargs.get(
            "evaluation_params",
            [LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
        )

        # Base test_case_params with mandatory `input` and `actual_output`
        test_case_params = {"input": expected, "actual_output": actual}

        if LLMTestCaseParams.EXPECTED_OUTPUT in evaluation_params:
            test_case_params["expected_output"] = expected

        if LLMTestCaseParams.INPUT in evaluation_params and "input" in kwargs:
            test_case_params["input"] = kwargs["input"]

        # Dynamically create the test case using the built params
        test_case = LLMTestCase(**test_case_params)

        if criteria:
            metric = GEval(
                name=metric_name,
                criteria=criteria,
                evaluation_params=evaluation_params,
                model=model,
                threshold=threshold or self._eval_config.g_eval_threshold,
                verbose_mode=verbose_mode,
            )
        else:
            metric = GEval(
                name=metric_name,
                evaluation_steps=evaluation_steps,
                evaluation_params=evaluation_params,
                model=model,
                threshold=threshold or self._eval_config.g_eval_threshold,
                verbose_mode=verbose_mode,
            )

        await asyncio.gather(metric.a_measure(test_case))

        return {"score": metric.score, "reason": metric.reason}

I'm not sure why this issue just happens with deepeval and how to solve it.

jyyogi commented 2 weeks ago

I'm seeing this error, too. it's unfortunate because we process thousands of records in a batch and if it fails for one call the whole thing goes down and we lose the data.

penguine-ip commented 2 weeks ago

Hey @paulacanva there are a few anti patterns in your code. When you want to run many test cases on one metric, you should use the evaluate function: https://docs.confident-ai.com/docs/evaluation-introduction#evaluating-without-pytest

We have throttling options, and even ignoring errors so you don't lose your data on error like how @jyyogi described. You can also cache your results so even if it errors you don't have to rerun everything the next time.

The way you're using asyncio.gather will introduce a lot of bugs since each metric is stateful. In the evaluate function we handle everything so there's no overwriting of values.

jyyogi commented 2 weeks ago

hey @penguine-ip - thanks for the quick reply here! I'm using evaluate and somehow glossed over the fact that we can cache our results as a way to get around failed calls. That feature is super nifty!

Do you have an example of how ignore_errors + write_cache / use_cache behave?

This is extremely useful if you're running large amounts of test cases. For example, lets say you're running 1000 test cases using deepeval test run, but you encounter an error on the 999th test case. The cache functionality would allow you to skip all the previously evaluated 999 test cases, and just evaluate the remaining one.

How does this work? How would I retry?