IST-DASLab / Mathador-LM

Code for the EMNLP 2024 paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".
Apache License 2.0
7 stars 0 forks source link

Request to evaluate the new O1 models by OpenAI (O1-preview and O1-mini) #1

Open Belzedar94 opened 1 month ago

Belzedar94 commented 1 month ago

Request in title. Also, it would be amazing to see the leaderboard somewhere in the github readme. Love your work! :)

dalistarh commented 1 month ago

Thanks! We already looked into it, but currently API access to O1 models is restricted to Tier 5 OpenAI developers. We will look into this as soon as it opens up. From early manual testing, it seems like O1 is superior to 4O, but we were surprised to see that it doesn't reach maximum score on the few instances we tried.

Belzedar94 commented 1 day ago

Hey! Do you have an update on this? Also, I'd love to see a leaderboard somewhere. Thanks a lot and keep up the good work!!