Open pengelbrecht opened 7 months ago
hi @pengelbrecht - thanks for the issue! let me know if something like this is what you're looking for and/or feel free to make a specific enhancement request
It seems like that would work. However, I'm primarily a hobby programmer who appreciates the simplicity of Marvin, so the subclassing approach might be a bit beyond my expertise. Ideally, I'd prefer something simpler and more in line with Marvin's design philosophy. Unfortunately, I'm not really qualified to suggest a specific alternative. Sorry.
thanks for the response @pengelbrecht - no worries.
if you don't mind, what would you ideal experience look like? people often have drastically different ideas as far as what they want token tracking to look like, but your perspective would be useful to build a sense of what a common-sense / middle-of-the-ground offering might look like
Here's how I do it today with direct openAI API use.
But returning a tuple doesn't feel very Marvinesque :)
def openai_cost_usd(model_name, prompt_tokens, completion_tokens):
if model_name == "gpt-4-turbo-preview":
return prompt_tokens * 10.0 / 1e6 + completion_tokens * 30.0 / 1e6
elif model_name == "gpt-3.5-turbo":
return prompt_tokens * 0.5 / 1e6 + completion_tokens * 1.5 / 1e6
else:
return None
async def fetch_chat_completion(
user_message: str,
system_prompt: str = _default_system_prompt,
model_name: str = _default_model,
temperature: float = _default_temperature,
) -> Tuple[str, int, float]:
"""Fetch a single chat completion for a user message"""
chat_completion = await client.chat.completions.create(
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message},
],
model=model_name,
temperature=temperature,
)
response_message = chat_completion.choices[0].message.content
prompt_tokens = chat_completion.usage.prompt_tokens
completion_tokens = chat_completion.usage.completion_tokens
total_tokens = prompt_tokens + completion_tokens
cost = openai_cost_usd(model_name, prompt_tokens, completion_tokens)
return response_message, total_tokens, cost
litellm's approach is wonderful: https://litellm.vercel.app/docs/completion/token_usage – but I guess there's no parallel to the completion object in Marvin's approach?
Discussed in https://github.com/PrefectHQ/marvin/discussions/546