Scripts to reproduce benchmark results

codelion / optillm

Optimizing inference proxy for LLMs

Apache License 2.0

1.6k stars 128 forks source link

Scripts to reproduce benchmark results #63

Closed zhxieml closed 1 month ago

zhxieml commented 1 month ago

Congrats on the great repository! The reported results are very impressive.

I wonder if there are any scripts or detailed instructions available to reproduce these benchmark results in README (e.g., the LiveCodeBench results)? It would be incredibly helpful.

codelion commented 1 month ago

@zhxieml I just used the official LiveCodeBench repository - https://github.com/LiveCodeBench/LiveCodeBench I added a line to use the optillm proxy here https://github.com/LiveCodeBench/LiveCodeBench/blob/f05cda286956b0a976df08afe2e2a323358d32d1/lcb_runner/runner/oai_runner.py#L17

base_url="http://localhost:8000/v1"

And then use plansearch-gpt-4o-mini as model name to run the benchmark.