SparksofAGI / MHPP

https://sparksofagi.github.io/MHPP/
25 stars 0 forks source link

Categorization of HumanEval problems #3

Closed wasiahmad closed 2 months ago

wasiahmad commented 2 months ago

The paper categorizes the code generation challenges in HumanEval into 7 categories. How the categorization is done? Can you share the category label for each HumanEval problem?

1e0ndavid commented 2 months ago

Sure, I have uploaded the category labels for all HumanEval problems. Please check the annotations directory.

Regarding the categorization procedure, as mentioned in our paper, we evaluated models (GPT-4, GPT-3.5, DeepSeekCoder, and WizardCoder) on HumanEval and derived some categories by analyzing errors made by LLMs. However, these categories reflect the models' weaknesses rather than the dataset's challenges. Therefore, our team of four annotators revisited the entire HumanEval benchmark to ensure the categorization is as model-agnostic as possible. Each annotator worked independently, and we later discussed and consolidated the results for each problem. While the categorization may not be perfect, we have made every effort to ensure consistency.