PrefectHQ / marvin

✨ Build AI interfaces that spark joy
https://askmarvin.ai
Apache License 2.0
5.27k stars 345 forks source link

Estimate token usage, cost #870

Open pengelbrecht opened 7 months ago

pengelbrecht commented 7 months ago

Discussed in https://github.com/PrefectHQ/marvin/discussions/546

Originally posted by **ww-jermaine** August 25, 2023 Hello, is there a way to estimate the token usage and cost per call of ai_fn, ai_model, etc.? Something like the callback from langchain: ``` Tokens Used: 0 Prompt Tokens: 0 Completion Tokens: 0 Successful Requests: 0 Total Cost (USD): $0.0 ```
zzstoatzz commented 7 months ago

hi @pengelbrecht - thanks for the issue! let me know if something like this is what you're looking for and/or feel free to make a specific enhancement request

pengelbrecht commented 7 months ago

It seems like that would work. However, I'm primarily a hobby programmer who appreciates the simplicity of Marvin, so the subclassing approach might be a bit beyond my expertise. Ideally, I'd prefer something simpler and more in line with Marvin's design philosophy. Unfortunately, I'm not really qualified to suggest a specific alternative. Sorry.

zzstoatzz commented 7 months ago

thanks for the response @pengelbrecht - no worries.

if you don't mind, what would you ideal experience look like? people often have drastically different ideas as far as what they want token tracking to look like, but your perspective would be useful to build a sense of what a common-sense / middle-of-the-ground offering might look like

pengelbrecht commented 7 months ago

Here's how I do it today with direct openAI API use.

But returning a tuple doesn't feel very Marvinesque :)


def openai_cost_usd(model_name, prompt_tokens, completion_tokens):
    if model_name == "gpt-4-turbo-preview":
        return prompt_tokens * 10.0 / 1e6 + completion_tokens * 30.0 / 1e6
    elif model_name == "gpt-3.5-turbo":
        return prompt_tokens * 0.5 / 1e6 + completion_tokens * 1.5 / 1e6
    else:
        return None

async def fetch_chat_completion(
    user_message: str,
    system_prompt: str = _default_system_prompt,
    model_name: str = _default_model,
    temperature: float = _default_temperature,
) -> Tuple[str, int, float]:
    """Fetch a single chat completion for a user message"""
    chat_completion = await client.chat.completions.create(
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ],
        model=model_name,
        temperature=temperature,
    )
    response_message = chat_completion.choices[0].message.content
    prompt_tokens = chat_completion.usage.prompt_tokens
    completion_tokens = chat_completion.usage.completion_tokens
    total_tokens = prompt_tokens + completion_tokens
    cost = openai_cost_usd(model_name, prompt_tokens, completion_tokens)
    return response_message, total_tokens, cost
pengelbrecht commented 6 months ago

litellm's approach is wonderful: https://litellm.vercel.app/docs/completion/token_usage – but I guess there's no parallel to the completion object in Marvin's approach?