Open suresiva opened 1 week ago
@suresiva we already support vertex ai llama on model garden. Please look at the relevant docs - https://docs.litellm.ai/docs/providers/vertex#llama-3-api
@krrishdholakia, there are 2 ways we can deploy Llama 3.1 on Vertex AI.
We are currently facing actual error posted on this thread while using the second option (self-deployed LLM endpoint in Model Garden). Please let us know how we can resolve the errors.
if you self deploy is it the same api spec? @suresiva
if so, it seems like we just need to let you specify this distinction - hey this is model follows the vertex/meta spec
@krrishdholakia , self-deployed Llama 3.1 model follows different request/response spec,
Request,
{ "instances": [{"prompt": "What is machine learning?", "max_tokens": 100}] }
Response,
{
"predictions": [
"Prompt:\nWhat is machine learning?\nOutput:\n A broad introduction\nMachine learning is..."
],
"deployedModelId": "xxxx",
"model": "projects/xxxx/locations/us-central1/models/llama-3-1-8b-instruct-172858156xxxx",
"modelDisplayName": "llama-3-1-8b-instruct-172858156xxxx",
"modelVersionId": "1"
}
Behind the scenes, this self-deployed Llama 3.1 model is actually deployed through vllm.entrypoints.api_server
entrypoint, which does not use the OpenAI's spec.
What happened?
We have a Llama 3.1 8B model deployed from VertexAI Model Garden and made available for inference through model endpoint. It takes input in a specific format and generates output as given below,
JSON request,
Response
We are using LiteLLM
v1.50.0-stable
version and we tried to configure above deployed Llama 3.1 model on LiteLLM as below,While making a completion call with a typical payload given below,
Getting a HTTP 500 error response from LiteLLM as given below,
While analyzing VertexAI model endpoint logs, found below error trace,
Relevant log output
No response
Twitter / LinkedIn details
No response