[Feature]: Support setting custom `api_base` for `vertex_ai_beta` models

BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

https://docs.litellm.ai/docs/

Other

10.12k stars 1.13k forks source link

[Feature]: Support setting custom `api_base` for `vertex_ai_beta` models #4317

Closed Manouchehri closed 1 week ago

Manouchehri commented 1 week ago

The Feature

Would be nice to be able to set a custom api_base for vertex_ai_beta.

Motivation, pitch

This is so I can use Cloudflare AI Gateway.

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

krrishdholakia commented 1 week ago

custom api_base for frequency_penalty

@Manouchehri you want to set a custom base to just use frequency_penalty?

Manouchehri commented 1 week ago

Sorry, copy paste fail. Fixed.

Manouchehri commented 1 week ago

@krrishdholakia d3a31461550f9c980aadd7de2bfb3699b07d0509 doesn't seem to do anything.

model_list:
  - model_name: gemini-experimental
    litellm_params:
      model: vertex_ai_beta/gemini-experimental
      vertex_project: litellm-epic
      vertex_location: us-central1
      api_base: https://us-central1-aiplatform.googleapis.com/v1/projects/litellm-epic/locations/us-central1/publishers/google/models/gemini-experimental

Is there a mistake in the async path maybe...?

Manouchehri commented 1 week ago

extra_headers would be nice to have too if you're poking around at this code already.

krrishdholakia commented 1 week ago

@Manouchehri what is your raw request?

I can see custom-api-base being passed for the async call

krrishdholakia commented 1 week ago

Manouchehri commented 1 week ago

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-experimental",
    "max_tokens": 10,
    "messages": [
      {
        "role": "user",
        "content": "what is 1 plus 1?"
      }
    ],
    "cache": {
      "no-cache": true
    }
  }'

I can clearly see the wrong endpoint is used:

< x-litellm-model-api-base: https://us-central1-aiplatform.googleapis.com/v1/projects/litellm-epic/locations/us-central1/publishers/google/models/gemini-experimental

krrishdholakia commented 1 week ago

this works for me:

- model_name: gemini-experimental
  litellm_params:
    model: vertex_ai_beta/gemini-experimental
    vertex_project: litellm-epic
    vertex_location: us-central1
    api_base: https://gateway.ai.cloudflare.com/v1/fa4cdcab1f32b95ca3b53fd36043d691/test/google-vertex-ai

Curl

curl -X POST 'http://localhost:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "model": "gemini-experimental",
  "max_tokens": 10,
  "cache": {
      "no-cache": true
    }
}
'

krrishdholakia commented 1 week ago

closing as this is live.

I suspect there might be a local version / caching issue on your end @Manouchehri

once we have a release out with this change, let's see if the issue persists for you

Manouchehri commented 1 week ago

Oops. Derp, I copy pasted the original URL instead of CF.. my bad.

Manouchehri commented 1 week ago

thanks, confirmed working!