Open Alex-HaochenLi opened 8 months ago
@Alex-HaochenLi
Same question. Have you received clarification?
Hello, I am very excited to read CodeChain. However, I have a question about the evaluation process in this repo.
It seems that during evaluation you test the code on the test cases from
test_example_tests.pkl
, but not the private test cases from APPS.Are the pass@1 results reported on the paper based on private test cases? Thank you for your clarification.
@Alex-HaochenLi
Same question. Have you received clarification?
Hello, I am very excited to read CodeChain. However, I have a question about the evaluation process in this repo. It seems that during evaluation you test the code on the test cases from
test_example_tests.pkl
, but not the private test cases from APPS. Are the pass@1 results reported on the paper based on private test cases? Thank you for your clarification.
Not yet :)
@Alex-HaochenLi
Thank you. In fact, when I was inspecting the evaluation_codechain.sh
file, I noticed the following:
# Test by hidden test cases
python src/evaluate.py --save_gen_path $output_path --eval_split $split
In src/evaluate.py
, when example_test_path
is not specified, it doesn't load the {split}_example_tests.pkl
file. Instead, it ultimately enters utils_evaluate.py
and uses example['input_output']
as the test case. In the test set of questions in codeparrot/apps
, input_output
represents private test cases.
Hello, I am very excited to read CodeChain. However, I have a question about the evaluation process in this repo.
It seems that during evaluation you test the code on the test cases from
test_example_tests.pkl
, but not the private test cases from APPS.Are the pass@1 results reported on the paper based on private test cases? Thank you for your clarification.