BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
9.8k stars 1.08k forks source link

[Feature]: Add NVIDIA NIM API provider #3896

Open tuanlv14 opened 2 weeks ago

tuanlv14 commented 2 weeks ago

The Feature

Pls add the method & proxy for NVIDIA API, which had example code:

from openai import OpenAI

client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = "$API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC"
)

completion = client.chat.completions.create(
  model="meta/llama3-70b-instruct",
  messages=[{"role":"user","content":""}],
  temperature=0.5,
  top_p=1,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

Motivation, pitch

NVIDIA API still is free-trial and good speed.

Twitter / LinkedIn details

No response

tuanlv14 commented 2 weeks ago

My python code:

import os
from litellm import completion

## set ENV variables
os.environ["OPENAI_API_KEY"] = My_KEY #key is not used for proxy

messages = [{ "content": "Hello, how are you?","role": "user"}]

response = completion(
    model="openai/mistralai/mistral-large", 
    messages=[{ "content": "Hello, how are you?","role": "user"}],
    api_base="https://integrate.api.nvidia.com/v1",
    # custom_llm_provider="openai" # litellm will use the openai.ChatCompletion to make the request

)
print(response)

Response: ModelResponse(id='chatcmpl-8bea69f9-ccfa-4d59-9912-47d7c579c9a3', choices=[Choices(finish_reason='stop', index=0, message=Message(content=" Hello! I'm just a computer program, so I don't have", role='assistant'), logprobs={'content': None, 'text_offset': [], 'token_logprobs': [0.0, 0.0], 'tokens': [], 'top_logprobs': []})], created=1717000089, model='mistralai/mistral-large', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=16, prompt_tokens=9, total_tokens=25))

But when I tried to config proxy as bellow:

This proxy will not be working. So pls help me check and fix error with LiteLLM proxy.

krrishdholakia commented 2 weeks ago

what's the error you see with the proxy? @tuanlv14

shuther commented 5 days ago

I was able to make it work using the proxy approach and the .yaml file below:

  - model_name: llama-nvidia
    litellm_params:
      model: openai/meta/llama3-70b-instruct
      api_base: https://integrate.api.nvidia.com/v1
      api_key: nvapi-Dxxx

but there is another nvidia endpoint with a different format, and I am not sure how to do it, see: https://ai.api.nvidia.com/v1/vlm/microsoft/phi-3-vision-128k-instruct