Open wangzhihao-coder opened 2 months ago
I also tried to reproduce the same results as @wangzhihao-coder without using docker. When following the tutorial in PISA, I encountered a mismatch of the package version in SBT. After fixing it, I started the PISA server successfully. However, the evaluation results (miniF2F-Isabelle-test: 21.72, miniF2F-Isabelle-valid: 22.13) were also worse than those mentioned in the paper. Is there anyone who can help?
I use the docker image from the PISA repository and the prediction file from output.zip of your repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc is about 10% compared to the result of 24.6%. I'd like to know what is the reason for this difference.