BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.16k stars 1.13k forks source link

[Bug]: Use alt=sse for Vertex AI streaming #4459

Closed Manouchehri closed 2 days ago

Manouchehri commented 2 days ago

What happened?

https://cloud.google.com/vertex-ai/generative-ai/docs/learn/streaming#rest-sse

For very long/slow prompts, having SSE for streaming seems better. e.g. some proxies will buffer non-text/event-stream responses.

With SSE:

< HTTP/2 200
< date: Fri, 28 Jun 2024 15:52:03 GMT
< content-type: text/event-stream
< cf-ray: 89aed7772b770ca6-EWR
< cf-cache-status: DYNAMIC
< content-disposition: attachment
< vary: Origin, X-Origin, Referer
< cf-aig-cache-status: MISS
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< x-xss-protection: 0
< server: cloudflare
<
data: {"nonce": "73a63c62", "candidates": [{"content": {"role": "model","parts": [{"text": "A"}]}}]}

data: {"nonce": "a9e713", "candidates": [{"content": {"role": "model","parts": [{"text": " snail is determined to win a prestigious car race. He shows up at the starting"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.16735472,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.10356715},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.09877259,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.09602549},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.28716773,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.09982066},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.14511536,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.14010079}]}]}

data: {"nonce": "a49b8cef2e", "candidates": [{"content": {"role": "model","parts": [{"text": " line, much to the amusement of the other racers in their sleek, powerful vehicles"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.11636176,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.10212548},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.10613343,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.07906799},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.28537196,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.066934206},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.16451646,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.14139993}]}]}

data: {"nonce": "4afc131503", "candidates": [{"content": {"role": "model","parts": [{"text": ". \n\nThe flag drops, the engines roar, and the snail... well, the snail starts sliming his way forward as fast as he can. The"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.109324835,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.08079154},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.14657521,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.082401514},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.23039988,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.064535476},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.16398026,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.14426936}]}]}

data: {"nonce": "5e7f", "candidates": [{"content": {"role": "model","parts": [{"text": " crowd erupts in laughter, but the snail remains focused.\n\nDays turn into weeks, and the race continues. The other racers, initially dismissive, start"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.115560874,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.078078166},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.09982066,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.049130786},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.25813892,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.0631349},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.13386749,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.1187934}]}]}

data: {"nonce": "ccfc6528ce9a", "candidates": [{"content": {"role": "model","parts": [{"text": " to get nervous. This snail, while slow, is relentless. He never stops, never sleeps, just keeps inching forward.\n\nMonths pass, and the finish line is in sight. The lead racer, a cocky hare in a flashy"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.15266281,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.15405758},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.13307686,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.062674366},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.41418782,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.121685736},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.15203226,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.16331196}]}]}

data: {"nonce": "a7ae3892", "candidates": [{"content": {"role": "model","parts": [{"text": " sports car, sees the snail gaining on him. Panicked, he pushes his car to the limit, but it's no use. The snail, with a final, slimy lunge, crosses the finish line first!\n\nThe crowd is"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.12033541,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.1301748},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.17738296,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.091544405},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.28626898,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.104294725},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.15165494,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.17189366}]}]}

data: {"nonce": "fe03", "candidates": [{"content": {"role": "model","parts": [{"text": " stunned. The snail is hoisted onto the shoulders of his newfound fans and showered with champagne. As he basks in the glory, a reporter approaches him for an interview.\n\n\"Mr. Snail,\" the reporter asks, \"how did you manage to win this incredible race?\"\n\nThe snail, still catching his breath,"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.09825223,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.10447732},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.08929565,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.05623635},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.25460163,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.08284563},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.14282866,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.1522842}]}]}

data: {"nonce": "55b6fc241d9030", "candidates": [{"content": {"role": "model","parts": [{"text": " looks at the reporter with a twinkle in his eye and whispers, \"You know, it's funny... I still don't know what all the fuss is about.  It's not like it was a close race or anything.\" \n"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.09912086,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.11008788},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.0715912,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.04560694},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.25739157,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.0894546},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.13892843,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.16858302}]}]}

data: {"nonce": "7be24a43065e11", "candidates": [{"content": {"role": "model","parts": [{"text": ""}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 6,"candidatesTokenCount": 308,"totalTokenCount": 314}}

Without SSE:

[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "A"
          }
        ]
      }
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " snail is determined to visit a far-off garden known for its delicious lettuce."
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.09301681,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.078925885
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.12126887,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.10212548
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.24653332,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06500873
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.20866229,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.13488984
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " He sets out on his journey, his pace, as you might imagine, quite"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.07558479,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.049130786
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10989668,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.054499872
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.23057307,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.045352582
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.20561504,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.10123348
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " slow.\n\nAfter a week of relentless slithering, he finally reaches the edge of a busy highway. Cars whiz by, making the snail's journey"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.06966823,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.051653776
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.17567945,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.058131594
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.19147882,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.03501263
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.17064616,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.11899801
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " seem even more impossible.\n\nJust then, a friendly turtle ambles up to him. \"Hey there, little buddy,\" the turtle says, \"Where you"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.06816437,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.05281402
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.1436676,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.04698664
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.18639107,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.040313438
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.1480472,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.10248422
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " headed in such a hurry?\"\n\nThe snail, exhausted but resolute, explains his quest for the legendary lettuce. The turtle, known for his helpful nature, offers him a ride. \"Hop on my back,\" he says, \"I'"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.063948415,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.060640547
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10123348,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.044348583
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14199379,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.037397135
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14057204,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08404062
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "ll get you there in a jiffy.\"\n\nOverjoyed, the snail climbs onto the turtle's shell. They set off across the highway, dodging traffic with the turtle's surprising agility.\n\nAs they approach the other side,"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.06489011,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06548521
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.113776386,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.0586686
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.15791446,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.040313438
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.11143445,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.0894546
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " the snail, filled with gratitude, shouts, \"Thank you so much, Mr. Turtle! You're a lifesaver!\"\n\nThe turtle, pleased with himself, gives a little nod.\n\nSuddenly, a voice booms from the side of the road, \"Hey! You two!\"\n\nThe turtle and the"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.06966823,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.071202725
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.071202725,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.033844035
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.15791446,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.04232459
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.09551807,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.0726367
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " snail freeze. They look over to see a police officer approaching, his face stern.\n\n\"Didn't you hear me?\" the officer bellows, \"Pull over right now!\"\n\nThe turtle, confused, slowly guides them to the side of the road. The officer walks up, taps on the turtle's shell"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.08787644,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08929565
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.13683891,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.060086653
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.1816984,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.054499872
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.11636176,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08064661
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " with his nightstick, and says, \"Sir, do you have any idea how fast you were going back there?\"\n\nThe turtle, flustered, stammers, \"Officer, I assure you, I wasn't speeding. I'm just a turtle, I can't go that fast!\"\n\n"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.09384396,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.09877259
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.13184245,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06348236
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.19775413,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.062445287
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10521054,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.08269734
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The officer leans down, peers at the turtle suspiciously, then notices the snail clinging to the shell. He raises an eyebrow and says, \"Then what's this about you shouting 'Weeeeeee!' all the way across the highway?\" \n"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10123348,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.10539454
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14511536,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06903793
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.20118472,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.068040416
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10800066,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.093678
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": ""
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 6,
    "candidatesTokenCount": 435,
    "totalTokenCount": 441
  }
}

Relevant log output

No response

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

krrishdholakia commented 2 days ago

@Manouchehri can you please rewrite this issue to point to the litellm-specific changes needed here.

From what i can tell our streaming vertex calls should already be SSE

Manouchehri commented 2 days ago

can you please rewrite this issue to point to the litellm-specific changes needed here.

https://github.com/BerriAI/litellm/blob/6b14cf765708376490c5d88d3e54edc173c343b6/litellm/llms/vertex_httpx.py#L830

You can easily see there is not a single alt=sse anywhere in LiteLLM.

image
Manouchehri commented 2 days ago

It is also really obvious there's no way the current code would handle SSE, as it's expecting JSON only here.

https://github.com/BerriAI/litellm/blob/6b14cf765708376490c5d88d3e54edc173c343b6/litellm/llms/vertex_httpx.py#L556

krrishdholakia commented 2 days ago

@Manouchehri see here - https://github.com/BerriAI/litellm/blob/6b14cf765708376490c5d88d3e54edc173c343b6/litellm/llms/vertex_httpx.py#L1358

We iterate through the received chunk, and parse the json from it

that is what is then given for the streaming call

krrishdholakia commented 2 days ago

Streaming vertex calls are made to a separate endpoint

https://github.com/BerriAI/litellm/blob/6b14cf765708376490c5d88d3e54edc173c343b6/litellm/llms/vertex_httpx.py#L829

They are also called with stream=True in the httpx call https://github.com/BerriAI/litellm/blob/6b14cf765708376490c5d88d3e54edc173c343b6/litellm/llms/vertex_httpx.py#L457


Closing issue as vertex ai streaming on litellm is already a streaming call.

If you can share a test case where the behaviour is not as expected please do so. Will help us understand the gaps.

Manouchehri commented 2 days ago

Streaming vertex calls are made to a separate endpoint

https://github.com/BerriAI/litellm/blob/6b14cf765708376490c5d88d3e54edc173c343b6/litellm/llms/vertex_httpx.py#L830

I still see no alt=sse...?

Closing issue as vertex ai streaming on litellm is already a streaming call.

It's not a SSE streaming call though..

krrishdholakia commented 2 days ago

you're looking at a deprecated endpoint. alt=sse is for PALM models. Not for gemini.

Screenshot 2024-06-28 at 9 55 02 AM Screenshot 2024-06-28 at 9 55 38 AM
krrishdholakia commented 2 days ago

this is streaming on gemini - https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#streaming

Screenshot 2024-06-28 at 9 56 15 AM
krrishdholakia commented 2 days ago

It's not a SSE streaming call though..

why do you say that?

the response i get back from the httpx call is a set of chunked events

is there something i'm missing from our call? i don't think it's alt=sse since it's not on their gemini docs, but i can test it to confirm

Manouchehri commented 2 days ago

Both response payloads I shared in https://github.com/BerriAI/litellm/issues/4459#issue-2380775205 were done with Gemini 1.5 Pro less than 60 minutes ago.

Manouchehri commented 2 days ago

why do you say that?

I can add ?alt=sse to base_url manually to confirm it doesn't work if you'd like. Give me a few minutes.

krrishdholakia commented 2 days ago

i can repro this via curl.

This is so weird. this is not on their gemini streaming docs

My bad. thanks for raising this @Manouchehri

Manouchehri commented 2 days ago
image

You probably already know this by now, but yeah the current LiteLLM code does not handle SSE for Vertex AI.

krrishdholakia commented 2 days ago

@Manouchehri we do get back the response as sse

alt=sse changes the response received to being in the correct json chunk format

without it, the response is received as partial json chunks which is why we need to use ijson to correctly handle this.

Working on a fix to use their alt=sse param

Manouchehri commented 2 days ago
image

I think there's a bug in the new code, seems like responses are being cut off sometimes.

Manouchehri commented 2 days ago
data: {"nonce": "f9cc5f30da5975", "candidates": [{"content": {"role": "model","parts": [{"text": "You"}]}}]}

data: {"nonce": "15ccc4a0ae", "candidates": [{"content": {"role": "model","parts": [{"text": "'re right, I have been a bit glitchy lately! I apologize if"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.08288509,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.10827419},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.041721944,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.031586528},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.22201821,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.12040904},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.14810015,"severity": "HARM_SEVERITY_LOW","severityScore": 0.22374786}]}]}

data: {"nonce": "ac160d0aed", "candidates": [{"content": {"role": "model","parts": [{"text": " my responses have been interrupted. I'm still under development and learning to be"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.043272704,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.0524832},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.022395115,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.019897413},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.08864924,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.0388167},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.11435278,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.117107384}]}]}

data: {"nonce": "cf76b0540b", "candidates": [{"content": {"role": "model","parts": [{"text": " the best language model I can be. \n\nIs there anything in particular you noticed me cutting off during? I'd love to know more so I can"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.03871872,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.04228497},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.06860357,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.073587105},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.083837815,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.05601694},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.21430598,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.18258592}]}]}

data: {"nonce": "b47e01a2226b", "candidates": [{"content": {"role": "model","parts": [{"text": " improve.  😊 \n"}]},"safetyRatings": [{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE","probabilityScore": 0.036266077,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.038282175},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE","probabilityScore": 0.059497934,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.07504185},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE","probabilityScore": 0.07208697,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.04733565},{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE","probabilityScore": 0.18794174,"severity": "HARM_SEVERITY_NEGLIGIBLE","severityScore": 0.1387987}]}]}

data: {"nonce": "461b9ac9", "candidates": [{"content": {"role": "model","parts": [{"text": ""}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 197,"candidatesTokenCount": 71,"totalTokenCount": 268}}
krrishdholakia commented 2 days ago

@Manouchehri your chunk stream looks fine to me

data: {"nonce": "461b9ac9", "candidates": [{"content": {"role": "model","parts": [{"text": ""}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 197,"candidatesTokenCount": 71,"totalTokenCount": 268}}

i also don't see this when making a regular curl request to the proxy

Screenshot 2024-06-28 at 11 46 12 AM
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
  "model": "gemini-1.5-flash-gemini",
  "messages": [
    {
      "role": "user",
      "content": "I think you'\''re getting cut off sometimes"
    }
  ],
  "stream": true,
}
'
krrishdholakia commented 2 days ago

can you share a curl with the error, for repro

Manouchehri commented 2 days ago

Oh it's really odd to trigger, you have to have multiple messages in the thread. (Not at my laptop until later this weekend, otherwise I'd give you an exact curl command.)

It didn't happen on the first one or two messages for me. Only later/longer convos.

krrishdholakia commented 2 days ago

Unable to repro @Manouchehri

Request

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
  "model": "gemini-1.5-flash-gemini",
  "messages": [
        {"role": "user", "content": "Hey, how'\''s it going?"},
        {
            "role": "assistant",
            "content": "I'\''m doing well. Would like to hear the rest of the story?"
        },
        {"role": "user", "content": "Na"},
        {
            "role": "assistant",
            "content": "No problem, is there anything else i can help you with today?"
        },
        {
            "role": "user",
            "content": "I think you'\''re getting cut off sometimes"
        }
    ],
  "stream": true
}
'

Response:

data: {"id":"chatcmpl-48a2e8ff-0584-4e6d-ba12-f53099b21ae6","choices":[{"index":0,"delta":{"content":"You","role":"assistant"}}],"created":1719611255,"model":"gemini-1.5-flash","object":"chat.completion.chunk"}

data: {"id":"chatcmpl-48a2e8ff-0584-4e6d-ba12-f53099b21ae6","choices":[{"index":0,"delta":{"content":"'re right! I am a large language model, and sometimes my responses can"}}],"created":1719611255,"model":"gemini-1.5-flash","object":"chat.completion.chunk"}

data: {"id":"chatcmpl-48a2e8ff-0584-4e6d-ba12-f53099b21ae6","choices":[{"index":0,"delta":{"content":" get cut off.  It's likely due to limitations with the interface,"}}],"created":1719611255,"model":"gemini-1.5-flash","object":"chat.completion.chunk"}

data: {"id":"chatcmpl-48a2e8ff-0584-4e6d-ba12-f53099b21ae6","choices":[{"index":0,"delta":{"content":" or maybe there's a connection issue.  \n\nLet's try again. What would you like to talk about? \n"}}],"created":1719611255,"model":"gemini-1.5-flash","object":"chat.completion.chunk"}

data: {"id":"chatcmpl-48a2e8ff-0584-4e6d-ba12-f53099b21ae6","choices":[{"finish_reason":"stop","index":0,"delta":{}}],"created":1719611255,"model":"gemini-1.5-flash","object":"chat.completion.chunk"}

data: [DONE]

If you have a consistent repro, can you file a separate issue and we can track it there.

Manouchehri commented 2 days ago

Try using 1.5 Pro, and like half a dozen messages that are much longer. (I’ll try on Sunday too.)

Manouchehri commented 2 days ago

image

image

krrishdholakia commented 1 day ago

Unable to repro on my end @Manouchehri

Just ran the streaming call 10 times and it worked each time

What is your config?

krrishdholakia commented 1 day ago

able to repro for cloudflare proxy

krrishdholakia commented 1 day ago

fixed - https://github.com/BerriAI/litellm/commit/4f32f283a3442b4abe73469f250a6a85bc517c68

Manouchehri commented 1 day ago

Hero!