Even Azure models that support streaming won't do it; the entire response is always returned in one chunk.
I have found no way to enable streaming using configuration, and from the code it doesn't seem possible. The problem appears to be with the can_stream property of the llm.Model class. Even if you define it using a config.yaml, it is ignored by the llm-azure plugin. AzureChat extends the OpenAI Chat, which in turn extends llm.Model. In Chat, can_stream is True by default, but this doesn't take effect because AzureChat doesn't call super().__init__(), so it becomes effectively False for all Azure models.
I propose to check config.yaml for a can_stream key, use it if present, and assume True otherwise. I will submit a PR shortly.
Even Azure models that support streaming won't do it; the entire response is always returned in one chunk.
I have found no way to enable streaming using configuration, and from the code it doesn't seem possible. The problem appears to be with the
can_stream
property of thellm.Model
class. Even if you define it using aconfig.yaml
, it is ignored by the llm-azure plugin.AzureChat
extends the OpenAIChat
, which in turn extendsllm.Model
. InChat
,can_stream
isTrue
by default, but this doesn't take effect becauseAzureChat
doesn't callsuper().__init__()
, so it becomes effectivelyFalse
for all Azure models.I propose to check
config.yaml
for acan_stream
key, use it if present, and assumeTrue
otherwise. I will submit a PR shortly.