Closed FrancescoMasaia closed 1 year ago
@FrancescoMasaia Thanks for reaching out to us and reporting this issue. Could you please share your requirement and your use case ? This will help us to provide you the concrete answer or share the possible alternatives. Awaiting your reply.
Sure, right now with openai chat completion api is present at the end of the response https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/OpenAI.Inference/examples/2023-07-01-preview/chat_completions.json
"usage": {
"completion_tokens": 557,
"prompt_tokens": 33,
"total_tokens": 590
}
but this seems to work only for non streamed response, and with stream=true, this usage structure is not populated. It could be very useful to have it working correctly. right now I've implemented an approximation of this count on my side but I don't know how accurate is it working.
@FrancescoMasaia Thanks for getting back. I will check this and get back to you.
@FrancescoMasaia This is not a supported feature at this time. stream does NOT support usage tokens. We are discussing if this is something we will create in the future. No ETA as of now. So we will update this thread once there is any update about this.
I am currently in need of the functionality for precise token calculation in the streaming version of the Azure OpenAI API. This feature is critical for accurately calculating user tokens to ensure proper billing. Additionally, it would enable me to implement appropriate throttling mechanisms on my side. At the moment, I'm using a custom token calculator based on tiktoken, but managing a separate token calculation process for billing purposes is complex and prone to inaccuracies. The availability of an official token count for streamed responses would greatly streamline this process and enhance the reliability of billing and service management.
Hi there ! I think this is really important Claude from Anthropic provides the tokens when stream is set to true so that would be something expected from Open AI , a leader on the field
Still no update about this?
I just updated my feat req issue about this...
This options seems to be switched on by default in the new OpenAI client (pulled in by Azure OpenAI 2.0.*). However, doesn't seem to work.
issue on OpenAI repo: https://github.com/openai/openai-dotnet/issues/103
I need this on the API, we are unnecessarily counting tokens on a stream response for analytics purposes. Any chance soon?
Try to use version 2.0-beta Azute.AI.OpenAI package. They dropped this one
Try to use version 2.0-beta Azute.AI.OpenAI package. They dropped this one
Actually I am a REST API consumer, not using any packages and seems like the responsible property is not supported on direct api call.
@BlackGad are you saying you have usage working with the Azure 2.0 package? I cannot get it to work. And OpenAI package lot saying it's not supported yet (comment on here https://github.com/openai/openai-dotnet/issues/103)
@laygir Hi, is this working for you now with the latest version of the REST API?
Azure openai still does not support streaming_options on api level. Even with the latest preview api. So there are still no progress since May
@BlackGad have you tried recently with a direct rest call?
We are discussing if this is something we will create in the future.
this comment from @navba-MSFT makes so sense to me. Assuming just bad wording. In my mind it's not a case of "if". My expectation is if OpenAI-OpenAI release a feature then it will be implemented by Azure OpenAI AND within a reasonable amount of time. A year later and still nothing is not a reasonable amount of time!
Azure openai still does not support streaming_options on api level. Even with the latest preview api. So there are still no progress since May
@BlackGad there are reports like this that this is working in some regions
@BlackGad have you tried recently with a direct rest call?
Yes. Direct open ai returns proper token usage in steam mode when request contains streaming_options. Azure counterpart returns validation error) that this option is not known. Tomorrow will upload direct requests and responses. Also will test different regions.
I ran some basic REST calls to the interference API (latest gpt-4o and API ver) and it seems usage is on as default (if not specified explicitly) and returns the usage data. tested against eastus and sweden global deployments.
I also just tried a few REST calls in Switzerland North using gpt-4-32k and gpt-3.5-16k both returns usage when streaming with "stream_options": { "include_usage": true }
in the body.
When using gpt-4 vision deployment I received an error "1 validation error for Request\nbody -> stream_options\n extra fields not permitted (type=value_error.extra)"
For my use case where I simply didn't want to count tokens my self when streaming, this is still not enough as if one model still requires manual counting, it isn't really worth trying to rely on the response usage counts.
Update: Tested with deployments in Sweden; gpt-4o-mini works when streaming and stream_options added in body to include usage while gpt-4o errored out just like gpt-4 vision in Switzerland.
I was wondering when will become available an official token count for the streaming version of the chat completion API.