IST-DASLab / Mathador-LM

Code for the paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".
Apache License 2.0
6 stars 0 forks source link

Request to evaluate the new O1 models by OpenAI (O1-preview and O1-mini) #1

Open Belzedar94 opened 6 days ago

Belzedar94 commented 6 days ago

Request in title. Also, it would be amazing to see the leaderboard somewhere in the github readme. Love your work! :)

dalistarh commented 6 days ago

Thanks! We already looked into it, but currently API access to O1 models is restricted to Tier 5 OpenAI developers. We will look into this as soon as it opens up. From early manual testing, it seems like O1 is superior to 4O, but we were surprised to see that it doesn't reach maximum score on the few instances we tried.