Mirascope / mirascope

LLM abstractions that aren't obstructions
https://docs.mirascope.io/
MIT License
684 stars 39 forks source link

Track costs for streaming with Cohere #218

Closed brenkao closed 1 month ago

brenkao commented 4 months ago

Is your feature request related to a problem? Please describe. Many providers are starting to add usage to streaming. This makes it much easier for Mirascope to calculate cost.

Describe the solution you'd like Add a total_cost property to CohereCallResponseChunk. Read the "event_type": "stream-end" sent by Cohere API and calculate cost using

"token_count": {
    "prompt_tokens": ...,
    "response_tokens": ...,
    "total_tokens": ...,
    "billed_tokens": ...,
}

Update https://github.com/Mirascope/mirascope/blob/dev/mirascope/cohere/utils.py as necessary.

willbakst commented 4 months ago

See #214 since these are related.

Namely: https://github.com/Mirascope/mirascope/issues/214#issuecomment-2098893697

tvj15 commented 4 months ago

Is your feature request related to a problem? Please describe. Many providers are starting to add usage to streaming. This makes it much easier for Mirascope to calculate cost.

Describe the solution you'd like Add a total_cost property to CohereCallResponseChunk. Read the "event_type": "stream-end" sent by Cohere API and calculate cost using

"token_count": {
    "prompt_tokens": ...,
    "response_tokens": ...,
    "total_tokens": ...,
    "billed_tokens": ...,
}

Update https://github.com/Mirascope/mirascope/blob/dev/mirascope/cohere/utils.py as necessary.

I am working on this but the problem I am facing is that event returned by co.chat_stream() is of type StreamedChatResponse and it's response property is of type NonStreamedChatResponse which does not have token_count property in it. I am not sure how do I access the token_count here.

willbakst commented 4 months ago

Doesn't the NonStreamedChatResponse type have response.meta.billed_units, which return ApiMetaBilledUnits from which we should be able to grab the same usage statistics that we do for the normal response? We can likely massage that data into the form we need to calculate cost, right?

tvj15 commented 4 months ago

Doesn't the NonStreamedChatResponse type have response.meta.billed_units, which return ApiMetaBilledUnits from which we should be able to grab the same usage statistics that we do for the normal response? We can likely massage that data into the form we need to calculate cost, right?

Yes, it does, but according to the API docs, the streamed response has no meta.billed_units property. I does have token_count though. I can look again at what is happening on the API side and update here.

brenkao commented 3 months ago

This is partially implemented with #307 where Cohere chunks will contain input_tokens and output_tokens which can be used to calculate cost. The only thing remaining that will need to be done is to pass cost into CohereCallResponseChunk.