Open Cheungki opened 6 months ago
Thx for your nice work!
As you mentioned here that deepseek-coder models show SOTA performance on APPS, while you might not report the exact scores or the code for evaluation on APPS benchmark. Will you share the evaluation scripts for APPS?
Thx for your nice work!
As you mentioned here that deepseek-coder models show SOTA performance on APPS, while you might not report the exact scores or the code for evaluation on APPS benchmark. Will you share the evaluation scripts for APPS?