deepseek-ai / DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
MIT License
783 stars 46 forks source link

minif2f-Isabella acc #30

Open wangzhihao-coder opened 1 month ago

wangzhihao-coder commented 1 month ago

I use the docker image from the PISA repository and the prediction file from output.zip of your repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc is about 10% compared to the result of 24.6%. I'd like to know what is the reason for this difference.

wyt2000 commented 1 week ago

I also tried to reproduce the same results as @wangzhihao-coder without using docker. When following the tutorial in PISA, I encountered a mismatch of the package version in SBT. After fixing it, I started the PISA server successfully. However, the evaluation results (miniF2F-Isabelle-test: 21.72, miniF2F-Isabelle-valid: 22.13) were also worse than those mentioned in the paper. Is there anyone who can help?