dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
285 stars 9 forks source link

Evaluation scripts for AIME and Odyssey-MATH #14

Open bmanczak opened 3 months ago

bmanczak commented 3 months ago

Amazing paper, congratulations!

Would it be possible to release the eval scripts for AIME and Odyssey-MATH?

we’re interested in the building upon your work. Thanks in advance!