codelion / optillm

Optimizing inference proxy for LLMs
Apache License 2.0
1.6k stars 128 forks source link

token counting #52

Closed darkacorn closed 1 month ago

darkacorn commented 1 month ago

for stream and non stream .. would be amazing .. specially for keeping track of i/o cost

easy enough for non stream .. but stream is a whole other can of worms

codelion commented 1 month ago

Token counting is already implemented here. The completion tokens you get from the proxy include the counts of all the tokens used in calls made during the approach.

We also did a test of various techniques on the AIME benchmark recently to check which ones are more effective on a per token basis. Results for that are available here

darkacorn commented 1 month ago

thank you did not see the discussions are active here .. sorry for opening an issue for that