[Feature]: Minimal output for health endpoint

fgreinacher commented 1 month ago

The Feature

The /health endpoint provides a lot of detailed information in the healthy_endpoints/unhealthy_endpoints properties. We usually expose health endpoints without authentication and therefore prefer to keep the response minimal.

Would you be open to a new setting like minimal_health_response to reduce the amount of data to what people can observe from the outside?

{
  "healthy_endpoints": [],
  "unhealthy_endpoints": [
    {
      "model": "openai/mistral-7b-instruct"
    },
    {
      "model": "text-completion-openai/starcoder2-3b"
    },
    {
      "model": "openai/bge-m3"
    }
  ],
  "healthy_count": 0,
  "unhealthy_count": 4
}

Motivation, pitch

This is too much for our taste (in this case api_base and error, but also other potentially exposed LiteLLM params):

{
  "healthy_endpoints": [],
  "unhealthy_endpoints": [
    {
      "api_base": "https://llm-01.internal",
      "model": "openai/mistral-7b-instruct",
      "stream_timeout": 5,
      "error": "Request timed out. stack trace: Traceback (most recent call last):\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 69, in map_httpcore_exceptions\n    yield\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 373, in handle_async_request\n    resp = await self._pool.handle_async_request(req)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 216, in handle_async_request\n    raise exc from None\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 196, in handle_async_request\n    response = await connection.handle_async_request(\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 99, in handle_async_request\n    raise exc\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 76, in handle"
    },
    {
      "api_base": "https://llm-02..internal",
      "model": "openai/mistral-7b-instruct",
      "stream_timeout": 5,
      "error": "Request timed out. stack trace: Traceback (most recent call last):\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 69, in map_httpcore_exceptions\n    yield\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 373, in handle_async_request\n    resp = await self._pool.handle_async_request(req)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 216, in handle_async_request\n    raise exc from None\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 196, in handle_async_request\n    response = await connection.handle_async_request(\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 99, in handle_async_request\n    raise exc\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 76, in handle"
    },
    {
      "api_base": "https://llm-03..internal",
      "model": "text-completion-openai/starcoder2-3b",
      "stream_timeout": 5,
      "error": "Request timed out. stack trace: Traceback (most recent call last):\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 69, in map_httpcore_exceptions\n    yield\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 373, in handle_async_request\n    resp = await self._pool.handle_async_request(req)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 216, in handle_async_request\n    raise exc from None\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 196, in handle_async_request\n    response = await connection.handle_async_request(\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 99, in handle_async_request\n    raise exc\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 76, in handle"
    },
    {
      "model": "openai/bge-m3",
      "api_base": "https://llm-04..internal",
      "error": "Request timed out. stack trace: Traceback (most recent call last):\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 69, in map_httpcore_exceptions\n    yield\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 373, in handle_async_request\n    resp = await self._pool.handle_async_request(req)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 216, in handle_async_request\n    raise exc from None\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py\", line 196, in handle_async_request\n    response = await connection.handle_async_request(\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 99, in handle_async_request\n    raise exc\n  File \"/opt/llm/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py\", line 76, in handle"
    }
  ],
  "healthy_count": 0,
  "unhealthy_count": 4
}

Twitter / LinkedIn details

No response

fgreinacher commented 1 month ago

BTW: Would be happy to contribute this if the maintainers agree.

krrishdholakia commented 1 month ago

hey @fgreinacher we currently provide this info to help debug the error.

Do you want to control this verbosity with a flag? Open to suggestions on this

krrishdholakia commented 1 month ago

We usually expose health endpoints

are you trying to expose the health of the proxy or the health of the LLM's

for the health of the proxy, you could use the /health/readiness or /health/liveliness endpoint

https://docs.litellm.ai/docs/proxy/health#summary

fgreinacher commented 1 month ago

are you trying to expose the health of the proxy or the health of the LLM's

We want to use this for monitoring the health of the downstream LLMs.

for the health of the proxy, you could use the /health/readiness or /health/liveliness endpoint

Yes, we already use these for Kubernetes probes 👍

Do you want to control this verbosity with a flag? Open to suggestions on this

Yes, that would be the idea. I'll draft a PR and ping you!

krrishdholakia commented 1 month ago

great

BerriAI / litellm