[FEATURE REQ] Azure.AI.OpenAI: Support HuggingFace chat completion streaming API

Library name

Azure.AI.OpenAI

Please describe the feature.

HuggingFace chat completion streaming API is designed to imitate OpenAI streaming response. However, due to a couple of minor differences, when pointing Azure SDK OpenAIClient to HuggingFace, method GetCompletionsStreaming hangs indefinitely:

HF doesn't terminate a stream with [DONE], so SseAsyncEnumerator while loop never breaks.
HF doesn't support NucleusSamplingFactor 0.0, and returns an error {"error":"Input validation error: `top_p` must be > 0.0 and < 1.0","error_type":"validation"}. Unfortunately the response status code is 200 OK so it doesn't trigger any exception. Users could workaround this issue by passing 0.01 instead, but there's no exception suggesting to change the value.

It would be great if Azure AI SDK had a way to workaround these issues, for instance:

Detect non-deserializable responses, e.g. SseAsyncEnumerator<Completions> should throw an exception when deserializing {"error":"Input validation error: `top_p` must be > 0.0 and < 1.0","error_type":"validation"}.
Detect when the remote endpoint stops sending data - as far as I know the HTTP connection is closed, so the client could stop waiting, without reaching a Task timeout

Azure / azure-sdk-for-net

[FEATURE REQ] Azure.AI.OpenAI: Support HuggingFace chat completion streaming API #44135

Library name

Please describe the feature.