This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.18k
stars
4.54k
forks
source link
[FEATURE REQ] Azure.AI.OpenAI: Support HuggingFace chat completion streaming API #44135
HuggingFace chat completion streaming API is designed to imitate OpenAI streaming response. However, due to a couple of minor differences, when pointing Azure SDK OpenAIClient to HuggingFace, method GetCompletionsStreaming hangs indefinitely:
HF doesn't terminate a stream with [DONE], so SseAsyncEnumeratorwhile loop never breaks.
HF doesn't support NucleusSamplingFactor 0.0, and returns an error {"error":"Input validation error: `top_p` must be > 0.0 and < 1.0","error_type":"validation"}. Unfortunately the response status code is 200 OK so it doesn't trigger any exception. Users could workaround this issue by passing 0.01 instead, but there's no exception suggesting to change the value.
It would be great if Azure AI SDK had a way to workaround these issues, for instance:
Detect non-deserializable responses, e.g. SseAsyncEnumerator<Completions> should throw an exception when deserializing {"error":"Input validation error: `top_p` must be > 0.0 and < 1.0","error_type":"validation"}.
Detect when the remote endpoint stops sending data - as far as I know the HTTP connection is closed, so the client could stop waiting, without reaching a Task timeout
Library name
Azure.AI.OpenAI
Please describe the feature.
HuggingFace chat completion streaming API is designed to imitate OpenAI streaming response. However, due to a couple of minor differences, when pointing Azure SDK
OpenAIClient
to HuggingFace, method GetCompletionsStreaming hangs indefinitely:HF doesn't terminate a stream with
[DONE]
, so SseAsyncEnumeratorwhile
loop never breaks.HF doesn't support
NucleusSamplingFactor
0.0, and returns an error{"error":"Input validation error: `top_p` must be > 0.0 and < 1.0","error_type":"validation"}
. Unfortunately the response status code is200 OK
so it doesn't trigger any exception. Users could workaround this issue by passing0.01
instead, but there's no exception suggesting to change the value.It would be great if Azure AI SDK had a way to workaround these issues, for instance:
SseAsyncEnumerator<Completions>
should throw an exception when deserializing{"error":"Input validation error: `top_p` must be > 0.0 and < 1.0","error_type":"validation"}
.See also https://github.com/huggingface/text-generation-inference/issues/1896 and https://github.com/microsoft/kernel-memory/issues/388