deepseek-ai / DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
MIT License
783 stars 46 forks source link

Ask about the evaluation of deepseek-math-rl #13

Closed ChengpengLi1003 closed 6 months ago

ChengpengLi1003 commented 6 months ago

I git clone this repo, and run the submit_eval_jobs.py, with 8GPUs. However, the tool-based results on MATH test set is 0.5786, where we set the iter=4, and our vllm-version is 0.2.0 as recommended. It has a gap with the reported 58.8, do you have any suggestions?

ZhihongShao commented 6 months ago

All results in our paper were obtained by using 4 $\times$ 8 = 32 GPUs for evaluation. We have also provided the model outputs in this repo. One potential factor that may explain the gap is the batch size, which can affect vllm inference.

ChengpengLi1003 commented 6 months ago

Thank you for your very quickly response. I try 32 GPUs by set n-gpus 32 and run submit_eval_jobs.py on 4*8 A100. However, there is a error named no Cuda Gpus are available in the 8.log to 31.log on node 1,and same error in the 40.log to 63.log on node2....,is my way to use 32gpus wrong?  Thanks a lot!

---Original--- From: "Zhihong @.> Date: Wed, Mar 6, 2024 20:39 PM To: @.>; Cc: @.**@.>; Subject: Re: [deepseek-ai/DeepSeek-Math] Ask about the evaluation ofdeepseek-math-rl (Issue #13)

All results in our paper were obtained by using 4 $\times$ 8 = 32 GPUs for evaluation. We have also provided the model outputs in this repo. One potential factor that may explain the gap is the batch size, which can affect vllm inference.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>