GAIR-NLP / OlympicArena

This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
https://gair-nlp.github.io/OlympicArena/
85 stars 3 forks source link

Request to evaluate the new O1 models by OpenAI (O1-preview and O1-mini) #4

Closed Belzedar94 closed 1 month ago

Belzedar94 commented 1 month ago

Request in title. Thanks :)

HuangZhen02 commented 1 month ago

Thank you for your suggestion! Due to the high access restrictions of the o1-preview, as well as the higher costs associated with the internal reasoning tokens, and the fact that the o1-preview does not currently support multimodal input, we have not yet tested the complete full set of results. However, we have tested a subset of the results (https://x.com/Z_Huang_02/status/1834634575345270898), which can still reflect some qualitative conclusions.

At the same time, considering that o1-preview introduces an additional internal reasoning process before answering, the fairness of directly comparing it with other models is still debatable.