Incomplete model generations in example file

KbsdJames / Omni-MATH

The official repository of the Omni-MATH benchmark.

41 stars 1 forks source link

Thank you very much for your attention and thorough exploration of our work. We apologize for our delayed response. During these days, we have been busy writing the technical report.

For your question, you are correct in your observations. We later identified issues with the inference code for several models regarding the setting of max_new_tokens. After adjusting the max_new_tokens to 2048, we evaluate the models once again. In response to your request, we have made all outputs and GPT evaluation files for Qwen2.5-MATH-72b-Instruct and Llama-3.1-70b-Instruct available for your reference. We have also released our technical report. If you have other concerns you can further refer to https://arxiv.org/abs/2410.07985

Thank you once again for your interest and valuable feedback.

KbsdJames / Omni-MATH

Incomplete model generations in example file #1