Accurate token usage in evals

ErikBjare / gptme

Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.

https://gptme.org/docs/

MIT License

812 stars 62 forks source link

Open ErikBjare opened 2 weeks ago

ErikBjare commented 2 weeks ago

Right now we just count the token length of the chat log, we should capture the actual spend.

Not sure if very high priority, not obvious how to do it the right way.

Might be interesting to set a token/message limit for evals, stopping long runs leading nowhere (not very common).