BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.26k stars 1.14k forks source link

[Bug]: [cloudflare] - config.yaml / how can I config CLOUDFLARE_ACCOUNT_ID ? #3721

Open tuanlv14 opened 1 month ago

tuanlv14 commented 1 month ago

What happened?

I config:

This config can be deployed successfully and when I send request to model, response is nothing. api_base: refer https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/

Relevant log output

No log.

Twitter / LinkedIn details

No response

krrishdholakia commented 1 month ago

what do your logs say when you run the proxy with --detailed_debug ? @tuanlv14

shuther commented 1 month ago

going back to documentation from: https://docs.litellm.ai/docs/providers/cloudflare_workers

  1. Could we get an example if we are using litellm as a proxy?
  2. It would help if in the debug we could see the url that is tried? At least when debug=true?

I tried with:

  - model_name: llama-2-7b-chat
    litellm_params:
# fake openai
      model: cloudflare/@cf/meta/llama-2-7b-chat-int8
      api_base: "https://api.cloudflare.com/client/v4/accounts/xxx/ai/run/"
      api_key: "dummy"
      headers: {
#        "HTTP-Referer": "litellm.ai",
        "X-Auth-Key": "yyy-4I6En"
      }

logs:


litellm         | 09:54:12 - LiteLLM Router:INFO: router.py:658 - litellm.acompletion(model=cloudflare/@cf/meta/llama-2-7b-chat-int8) Exception Cloudflare Exception - Cloudflare Exception - {"success":false,"errors":[{"code":10000,"message":"Authentication error"}]}
litellm         |
litellm         | 09:54:12 - LiteLLM:DEBUG: main.py:4300 - initial list of deployments: [{'model_name': 'llama-2-7b-chat', 'litellm_params': {'api_key': 'dummy', 'api_base': 'https://api.cloudflare.com/client/v4/accounts/xxx/ai/run/', 'model': 'cloudflare/@cf/meta/llama-2-7b-chat-int8', 'headers': {'X-Auth-Key': 'yyy-4I6En'}}, 'model_info': {'id': 'xxx', 'db_model': False}}]
litellm         | 09:54:12 - LiteLLM:DEBUG: caching.py:22 - async get cache: cache key: 09-54:cooldown_models; local_only: False
litellm         | 09:54:12 - LiteLLM:DEBUG: caching.py:22 - in_memory_result: None
litellm         | 09:54:12 - LiteLLM:DEBUG: caching.py:22 - Get Async Redis Cache: key: 09-54:cooldown_models
litellm         | 09:54:12 - LiteLLM:DEBUG: caching.py:22 - Got Async Redis Cache: key: 09-54:cooldown_models, cached_response None
litellm         | 09:54:12 - LiteLLM:DEBUG: caching.py:22 - get cache: cache result: None
litellm         | 09:54:12 - LiteLLM Router:DEBUG: router.py:2347 - retrieve cooldown models: []
litellm         | 09:54:12 - LiteLLM Router:DEBUG: router.py:1670 - TracebackTraceback (most recent call last):
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 2273, in completion
litellm         |     response = cloudflare.completion(
litellm         |                ^^^^^^^^^^^^^^^^^^^^^^
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/llms/cloudflare.py", line 145, in completion
litellm         |     raise CloudflareError(
litellm         | litellm.llms.cloudflare.CloudflareError: {"success":false,"errors":[{"code":10000,"message":"Authentication error"}]}
litellm         |
litellm         |
litellm         | During handling of the above exception, another exception occurred:
litellm         |
litellm         | Traceback (most recent call last):
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 356, in acompletion
litellm         |     response = await loop.run_in_executor(None, func_with_context)  # type: ignore
litellm         |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm         |   File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
litellm         |     result = self.fn(*self.args, **self.kwargs)
litellm         |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 3102, in wrapper
litellm         |     result = original_function(*args, **kwargs)
litellm         |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 2435, in completion
litellm         |     raise exception_type(
litellm         |           ^^^^^^^^^^^^^^^
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 9976, in exception_type
litellm         |     raise e
litellm         |   File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 9248, in exception_type
litellm         |     raise AuthenticationError(
litellm         | litellm.exceptions.AuthenticationError: Cloudflare Exception - {"success":false,"errors":[{"code":10000,"message":"Authentication error"}]}

I used to make it work with an old version of litellm:

  - model_name: gpt-3.5-turbo
    litellm_params:
# fake openai
      model: gpt-3.5-turbo
      api_base: "https://api.cloudflare.com/client/v4/accounts/yyy/ai/run/@cf/meta/llama-2-7b-chat-int8"
      api_key: "dummy"
      headers: {
#        "HTTP-Referer": "litellm.ai",
        "X-Auth-Key": "xxx-4I6En"
      }