BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.04k stars 1.39k forks source link

[Bug]: Deepinfra, NLPCloud, streaming not working #918

Closed krrishdholakia closed 9 months ago

krrishdholakia commented 9 months ago

What happened?

Streaming is not working for deepinfra and nlp cloud

Relevant log output

TypeError                                 Traceback (most recent call last)
<ipython-input-3-745caaaf8f60> in <cell line: 11>()
      9 )
     10 
---> 11 for chunk in response:
     12     print(chunk)

14 frames
/usr/local/lib/python3.10/dist-packages/litellm/llms/openai.py in streaming(self, logging_obj, timeout, data, model, api_key, api_base)
    276     ):
    277         openai_client = OpenAI(api_key=api_key, base_url=api_base, http_client=litellm.client_session, timeout=timeout, max_retries=data.pop("max_retries", 2))
--> 278         response = openai_client.chat.completions.create(**data)
    279         streamwrapper = CustomStreamWrapper(completion_stream=response, model=model, custom_llm_provider="openai",logging_obj=logging_obj)
    280         for transformed_chunk in streamwrapper:

/usr/local/lib/python3.10/dist-packages/openai/_utils/_utils.py in wrapper(*args, **kwargs)
    297                         msg = f"Missing required argument: {quote(missing[0])}"
    298                 raise TypeError(msg)
--> 299             return func(*args, **kwargs)
    300 
    301         return wrapper  # type: ignore

/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py in create(self, messages, model, frequency_penalty, function_call, functions, logit_bias, max_tokens, n, presence_penalty, response_format, seed, stop, stream, temperature, tool_choice, tools, top_p, user, extra_headers, extra_query, extra_body, timeout)
    596         timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
    597     ) -> ChatCompletion | Stream[ChatCompletionChunk]:
--> 598         return self._post(
    599             "/chat/completions",
    600             body=maybe_transform(

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in post(self, path, cast_to, body, options, files, stream, stream_cls)
   1061             method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1062         )
-> 1063         return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
   1064 
   1065     def patch(

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in request(self, cast_to, options, remaining_retries, stream, stream_cls)
    840         stream_cls: type[_StreamT] | None = None,
    841     ) -> ResponseT | _StreamT:
--> 842         return self._request(
    843             cast_to=cast_to,
    844             options=options,

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in _request(self, cast_to, options, remaining_retries, stream, stream_cls)
    860 
    861         retries = self._remaining_retries(remaining_retries, options)
--> 862         request = self._build_request(options)
    863         self._prepare_request(request)
    864 

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in _build_request(self, options)
    459 
    460         # TODO: report this error to httpx
--> 461         return self._client.build_request(  # pyright: ignore[reportUnknownMemberType]
    462             headers=headers,
    463             timeout=self.timeout if isinstance(options.timeout, NotGiven) else options.timeout,

/usr/local/lib/python3.10/dist-packages/httpx/_client.py in build_request(self, method, url, content, data, files, json, params, headers, cookies, timeout, extensions)
    356             )
    357             extensions = dict(**extensions, timeout=timeout.as_dict())
--> 358         return Request(
    359             method,
    360             url,

/usr/local/lib/python3.10/dist-packages/httpx/_models.py in __init__(self, method, url, params, headers, cookies, content, data, files, json, stream, extensions)
    336         if stream is None:
    337             content_type: typing.Optional[str] = self.headers.get("content-type")
--> 338             headers, stream = encode_request(
    339                 content=content,
    340                 data=data,

/usr/local/lib/python3.10/dist-packages/httpx/_content.py in encode_request(content, data, files, json, boundary)
    212         return encode_urlencoded_data(data)
    213     elif json is not None:
--> 214         return encode_json(json)
    215 
    216     return {}, ByteStream(b"")

/usr/local/lib/python3.10/dist-packages/httpx/_content.py in encode_json(json)
    175 
    176 def encode_json(json: Any) -> Tuple[Dict[str, str], ByteStream]:
--> 177     body = json_dumps(json).encode("utf-8")
    178     content_length = str(len(body))
    179     content_type = "application/json"

/usr/lib/python3.10/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

/usr/lib/python3.10/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/usr/lib/python3.10/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/usr/lib/python3.10/json/encoder.py in default(self, o)
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181 

TypeError: Object of type type is not JSON serializable

Twitter / LinkedIn details

No response

krrishdholakia commented 9 months ago

fix pushed for deepinfra - https://github.com/BerriAI/litellm/commit/30f47d3169a8587fd54062d57fd75965b2427001

there was a typo in optional params. also included in test_streaming.

krrishdholakia commented 9 months ago

looks like nlp cloud has changed their streaming format.

Screenshot 2023-11-25 at 12 46 18 PM
krrishdholakia commented 9 months ago

fix pushed + testing added to pipeline. I recall nlp cloud being fairly flaky though - https://github.com/BerriAI/litellm/commit/6d9f7b8f9d4efb1d238367ea1d4b5a8315aec8c8

krrishdholakia commented 9 months ago

will close issue once fix is in prod

krrishdholakia commented 9 months ago

Live in v1.7.1 @toniengelhardt

toniengelhardt commented 9 months ago

Amazing, thanks for the quick fix!