Blaizzy / fastmlx

FastMLX is a high performance production ready API to host MLX models.
Other
148 stars 11 forks source link

Implement Basic Token Usage Tracking #8

Open Blaizzy opened 1 month ago

Blaizzy commented 1 month ago

Description:

We'd like to add a simple token usage tracking feature to our FastMLX application. This will help users understand how many tokens their requests are consuming.

Objective:

Implement a function that counts the number of tokens in the input and output of our AI models.

Tasks:

  1. Create a new function count_tokens(text: str) -> int in the utils.py file.
  2. Use the appropriate tokenizer from our AI model to count tokens.
  3. Integrate this function into the main request processing flow in main.py.
  4. Update the response structure to include token counts.

Example Implementation:

from transformers import AutoTokenizer

def count_tokens(text: str) -> int:
    tokenizer = AutoTokenizer.from_pretrained("gpt2")  # or use our model's tokenizer
    return len(tokenizer.encode(text))

# In main request processing:
input_tokens = count_tokens(user_input)
output_tokens = count_tokens(model_output)
total_tokens = input_tokens + output_tokens

response = {
    "output": model_output,
    "usage": {
        "prompt_tokens": input_tokens,
        "completion_tokens": output_tokens,
        "total_tokens": total_tokens
    }
}

Guidelines:

Resources:

Definition of Done:

We're excited to see your contribution! This feature will help our users better understand and manage their token usage. Good luck!

antunsz commented 1 month ago

Hi @Blaizzy . Perhaps this issue could be an opportunity to implement LLM tracking using AgentOps [https://github.com/AgentOps-AI/agentops] (for example). Or do you see this as a step for the future, or is that not the right approach? What are your thoughts?