SalesforceAIResearch / CodeChain

Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"
Apache License 2.0
35 stars 4 forks source link

Reproducing the WizardCoder 15B results #2

Open sh0416 opened 1 month ago

sh0416 commented 1 month ago

I am trying to reproduce the result reported in your paper, but the APPS introductory pass@1 is about 0.18, which is different from 0.26.

I use the checkpoint WizardLMTeam/WizardCoder-15B-V1.0.

Could anyone help reproducing this result???

sh0416 commented 1 month ago
Difficulty level: introductory, pass@k: {1: 0.17739999999999997}, num of examples: 1000, total_cost: 0
Difficulty level: interview, pass@k: {1: nan}, num of examples: 0, total_cost: 0
Difficulty level: competition, pass@k: {1: nan}, num of examples: 0, total_cost: 0
Number of outputs distributions:
/data2/seonghyeon/repositories/CodeChain/src/evaluate.py:111: RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.
  print(describe(num_outputs))
/data2/seonghyeon/anaconda3/envs/codefeedbackbench/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1418: RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.
  sk = skew(a, axis, bias=bias)
/data2/seonghyeon/anaconda3/envs/codefeedbackbench/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1419: RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.
  kurt = kurtosis(a, axis, bias=bias)
DescribeResult(nobs=1000, minmax=(5, 5), mean=5.0, variance=0.0, skewness=nan, kurtosis=nan)
pass@1 0.17739999999999997 nan nan

Here is the result.

sh0416 commented 1 month ago

With greedy decoding, the score increased to 0.213, but still far from 0.26.