BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.63k stars 1.6k forks source link

[Help]: LiteLLM proxy - langfuse tracking not working for "gemini-pro" (google vertex ai) streaming #1811

Closed jinnotgin closed 9 months ago

jinnotgin commented 9 months ago

What happened?

I currently have 2 models set up: a) Google Gemini Pro (Vertex AI) and b) a VLLM endpoint that is OpenAI API compatible

litellm_settings:
  vertex_project: "<GCP_PROJECT_ID>"
  vertex_location: "us-central1" # location for gemini

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini-pro
      rpm: 60  # request per minute
  - model_name: code-instruct
    litellm_params:
      model: openai/TheBloke/deepseek-coder-33B-instruct-AWQ
      api_base: http://<VLLM HOST>/v1 
      api_key: "<API_KEY>"
      rpm: 120

litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
  drop_params: True
  set_verbose: True
  success_callback: ["langfuse"]

general_settings:
  master_key: <MASTER_KEY>
  database_url: <DATABASE_URL>
  max_parallel_requests: 10 # max parallel requests for a user = 100
  budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

However, after making inferences call to both endpoints/models, only those to the OpenAI endpoint (VLLM deepseek) is tracked in langfuse. Meanwhile, the calls to Gemini Pro are ignored. See below:

image

Not sure why the Gemini Pro calls are not tracked. Is it a misconfig somwhere, or a bug?

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 9 months ago

Hey @jinnotgin I'll look into this.

Btw I noticed you set a budget duration but no budget. Why's that?

krrishdholakia commented 9 months ago

Also you seem to be setting litellm_settings twice in your config. Can you consolidate them and let me know if the issue persists?

jinnotgin commented 9 months ago

Ah, apologies - the config was a little mixed up as I was trying out different feature sets over time.

Here's my updated config:

litellm_settings:
  vertex_project: "<GCP_PROJECT_ID>"
  vertex_location: "us-central1" # location for gemini
  drop_params: True
  set_verbose: True
  success_callback: ["langfuse"]

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini-pro
      rpm: 60  # request per minute
  - model_name: code-instruct
    litellm_params:
      model: openai/TheBloke/deepseek-coder-33B-instruct-AWQ
      api_base: http://<VLLM HOST>/v1 
      api_key: "<API_KEY>"
      rpm: 120

general_settings:
  master_key: <MASTER_KEY>
  database_url: <DATABASE_URL>
  max_parallel_requests: 10 # max parallel requests for a user = 100

Unfortunately, I'm still facing the same issue

krrishdholakia commented 9 months ago

ok, thanks for testing this. I'll look into it today @jinnotgin

Curious - what're you using the proxy for?

krrishdholakia commented 9 months ago

Also DM'ed on linkedin to start a support channel

krrishdholakia commented 9 months ago

@jinnotgin Unable to repro

Screenshot 2024-02-05 at 3 02 22 PM

Can you run with --detailed_debug and share any related logs? :

litellm --config /path/to/config.yaml --detailed_debug
jinnotgin commented 9 months ago

That's odd. Hmm, so here's the output I have right now (with some info redacted), when there are 3 workers set up:

langfuse.log

All in all - the logs looks all right actually? (Unless I'm missing something.) I still don't quite get why it works for one model and not the other.

I've also tried setting it as:


  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-pro
      rpm: 60  # request per minute

But that didn't help either, though that did seemed to help with another issue i was facing regarding LiteLLM Router:DEBUG: {'gemini-pro': ["<class 'litellm.exceptions.APIConnectionError'>Status: 500Message: None'float' object cannot be interpreted as an integerFull exception'float' object cannot be interpreted as an integer"]}? Still unsure - so I do more testing on that and raise another issue separately if needed.

For context, I'm trying to set up a LiteLLM proxy to have virtual_keys for developers to use with Continue.dev IDE plugin, for a co-pilot like experience while using Gemini Pro under Google Vertex AI.

krrishdholakia commented 9 months ago

Oh i can see the issue

krrishdholakia commented 9 months ago

caused due to lack of finish reason in last chunk

jinnotgin commented 9 months ago

great! it tracks as expected on langfuse now. :) closing this issue!