Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.18k stars 4.54k forks source link

[FEATURE REQ] Add access to CompletionsUsage in StreamingChatCompletions #38491

Open garbidge opened 10 months ago

garbidge commented 10 months ago

Library name

Azure.AI.OpenAI

Please describe the feature.

From what I can see, there is no way to get the CompletionsUsage of a request when using StreamingChatCompletions. It has private readonly IList<ChatCompletions> _baseChatCompletions; but I don't see anywhere this is exposed.

It would be nice if there was a way to check the token usage after streaming is complete.

(My apologies if I have missed somewhere that you can already do this when using StreamingChatCompletions)

Ref: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/src/Custom/StreamingChatCompletions.cs

github-actions[bot] commented 10 months ago

Thank you for your feedback. This has been routed to the support team for assistance.

felix-lausch commented 10 months ago

I have been wondering the same for months. Tracking usage is trivially easy for the non-streaming version but seems impossible for streaming.

kwemou commented 9 months ago

Dear all, any chance to have this feature? Thank you in advance

Jean-Fischer commented 8 months ago

I would be needing this as well. Is there any workaround to access the token usage ? Thanks

felix-lausch commented 8 months ago

For now it seems like the only feasible option is to count the token usage yourself.

In my (limited) experiments the combination of the following two methods has been 100% in line with the metrics i can see in azure portal for gpt-4.

Use this method to calculate the prompt tokens:

    /// <summary>
    /// Calculate the number of tokens that the messages would consume.
    /// Based on: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
    /// </summary>
    /// <param name="messages">Messages to calculate token count for.</param>
    /// <returns>Number of tokens</returns>
    public int GetTokenCount(IEnumerable<Azure.AI.OpenAI.ChatMessage> messages)
    {
        const int TokensPerMessage = 3;
        const int TokensPerRole = 1;
        const int BaseTokens = 3;
        var disallowedSpecial = new HashSet<string>();

        var tokenCount = BaseTokens;

        var encoding = SharpToken.GptEncoding.GetEncoding("cl100k_base");
        foreach (var message in messages)
        {
            tokenCount += TokensPerMessage;
            tokenCount += TokensPerRole;
            tokenCount += encoding.Encode(message.Content, disallowedSpecial).Count;
        }

        return tokenCount;
    }

And simply count the number of messages that you receive when consuming the response stream:

//...
OpenAIClient client = new(new Uri(endpoint), new AzureKeyCredential(key));

StreamingChatCompletions completions = await client.GetChatCompletionsStreamingAsync("gpt-4", input);

StreamingChatChoice choice = await completions.GetChoicesStreaming().FirstAsync();

int responseTokenCount = 0;
await foreach (var message in choice.GetMessageStreaming())
{
    responseTokenCount++;
    yield return message.Content;
}
//...
BlackGad commented 2 months ago

@felix-lausch The definitions for the schemas of tools and functions, as well as responses to these topics lacked.

Use SharpToken.GptEncoding.CountTokens method. It is optimized for this case.

m-gug commented 1 month ago

It is now possilbe in the offical Open AI API zu request token usage with streaming (https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156) Can we implement this feature here as well?