clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
25 stars 34 forks source link

[analysis] LLM Calculator: inference parameters, like latency etc. - find the best models based on filters #100

Open davidschlangen opened 4 months ago

davidschlangen commented 4 months ago

We have all this data lying around, so might just as well use it: For the API-accessed models, we can compute latency, re-query rate, etc... We have timestamps for everything in the logs, just need to parse them.

Inspiration: https://artificialanalysis.ai/models (Suggestion by SH)

davidschlangen commented 2 months ago

So, we basically just need a script (maybe to live in evaluation/ that pulls this data our of results.csv.

davidschlangen commented 1 month ago

@kushal-10 is working on this.