Implement Basic Token Usage Tracking

Description:

We'd like to add a simple token usage tracking feature to our FastMLX application. This will help users understand how many tokens their requests are consuming.

Objective:

Implement a function that counts the number of tokens in the input and output of our AI models.

Tasks:

Create a new function count_tokens(text: str) -> int in the utils.py file.
Use the appropriate tokenizer from our AI model to count tokens.
Integrate this function into the main request processing flow in main.py.
Update the response structure to include token counts.

Example Implementation:

from transformers import AutoTokenizer

def count_tokens(text: str) -> int:
    tokenizer = AutoTokenizer.from_pretrained("gpt2")  # or use our model's tokenizer
    return len(tokenizer.encode(text))

# In main request processing:
input_tokens = count_tokens(user_input)
output_tokens = count_tokens(model_output)
total_tokens = input_tokens + output_tokens

response = {
    "output": model_output,
    "usage": {
        "prompt_tokens": input_tokens,
        "completion_tokens": output_tokens,
        "total_tokens": total_tokens
    }
}

Guidelines:

Focus on basic functionality first. We can optimize later.
Make sure to handle potential errors, like invalid inputs.
Add comments to explain your code.
If you're unsure about anything, feel free to ask questions in the comments!

Resources:

Hugging Face Tokenizers

Definition of Done:

Function implemented and integrated into main flow.
Response includes token usage information.
Basic error handling is in place.
Code is commented and follows our style guide.

We're excited to see your contribution! This feature will help our users better understand and manage their token usage. Good luck!

Blaizzy / fastmlx

Implement Basic Token Usage Tracking #8