Closed jinnotgin closed 9 months ago
Hey @jinnotgin I'll look into this.
Btw I noticed you set a budget duration but no budget. Why's that?
Also you seem to be setting litellm_settings
twice in your config. Can you consolidate them and let me know if the issue persists?
Ah, apologies - the config was a little mixed up as I was trying out different feature sets over time.
Here's my updated config:
litellm_settings:
vertex_project: "<GCP_PROJECT_ID>"
vertex_location: "us-central1" # location for gemini
drop_params: True
set_verbose: True
success_callback: ["langfuse"]
model_list:
- model_name: gemini-pro
litellm_params:
model: gemini-pro
rpm: 60 # request per minute
- model_name: code-instruct
litellm_params:
model: openai/TheBloke/deepseek-coder-33B-instruct-AWQ
api_base: http://<VLLM HOST>/v1
api_key: "<API_KEY>"
rpm: 120
general_settings:
master_key: <MASTER_KEY>
database_url: <DATABASE_URL>
max_parallel_requests: 10 # max parallel requests for a user = 100
Unfortunately, I'm still facing the same issue
ok, thanks for testing this. I'll look into it today @jinnotgin
Curious - what're you using the proxy for?
Also DM'ed on linkedin to start a support channel
@jinnotgin Unable to repro
Can you run with --detailed_debug
and share any related logs? :
litellm --config /path/to/config.yaml --detailed_debug
That's odd. Hmm, so here's the output I have right now (with some info redacted), when there are 3 workers set up:
All in all - the logs looks all right actually? (Unless I'm missing something.) I still don't quite get why it works for one model and not the other.
I've also tried setting it as:
- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-pro
rpm: 60 # request per minute
But that didn't help either, though that did seemed to help with another issue i was facing regarding LiteLLM Router:DEBUG[0m: {'gemini-pro': ["<class 'litellm.exceptions.APIConnectionError'>Status: 500Message: None'float' object cannot be interpreted as an integerFull exception'float' object cannot be interpreted as an integer"]}
? Still unsure - so I do more testing on that and raise another issue separately if needed.
For context, I'm trying to set up a LiteLLM proxy to have virtual_keys for developers to use with Continue.dev IDE plugin, for a co-pilot like experience while using Gemini Pro under Google Vertex AI.
Oh i can see the issue
caused due to lack of finish reason in last chunk
great! it tracks as expected on langfuse now. :) closing this issue!
What happened?
I currently have 2 models set up: a) Google Gemini Pro (Vertex AI) and b) a VLLM endpoint that is OpenAI API compatible
However, after making inferences call to both endpoints/models, only those to the OpenAI endpoint (VLLM deepseek) is tracked in langfuse. Meanwhile, the calls to Gemini Pro are ignored. See below:
Not sure why the Gemini Pro calls are not tracked. Is it a misconfig somwhere, or a bug?
Relevant log output
No response
Twitter / LinkedIn details
No response