Lower test passing rates compared to original results.

FudanSELab / ClassEval

Benchmark ClassEval for class-level code generation.

MIT License

108 stars 5 forks source link

Hi, Thank you very much for sharing this benchmark and all the hard work! I have a question regarding the test passing rates on the generated code. I followed the steps indicated for the evaluation and just executed the tests on the predicted code from the dataset but I found that the test passing rates are much lower than the clarified ones. For example, the test passing rates (class level) on the code generated in GPT-4-Turbo_class_H_greedy is only 7% on my side while the original one is 38%. I wonder whether do I need to configure something (e.g., environments) for the test cases to be successfully executed. I would appreciate it if you can shed a light on this :-)

FudanSELab / ClassEval

Lower test passing rates compared to original results. #7