bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
745 stars 193 forks source link

Add: learning performance-improving code edits 🥧 #65

Open SwayamInSync opened 1 year ago

SwayamInSync commented 1 year ago

PR to add LEARNING PERFORMANCE-IMPROVING CODE EDITS with PIE dataset, few shot evaluations for program performance improvement

Muennighoff commented 1 year ago

Nice! Some comments:

SwayamInSync commented 1 year ago

@Muennighoff Those are test cases and are needed in order to evaluate the correctness of the generated program.

Muennighoff commented 1 year ago

@Muennighoff Those are test cases and are needed in order to evaluate the correctness of the generated program.

I think they should be uploaded to a dataset on the HF Hub that is then loaded like it's done for the other eval tasks

SwayamInSync commented 1 year ago

Nice! Some comments:

  • Remove .pyc .DS_Store and other files from the PR that are not important
  • Can't we upload all those .txt files to the hub? / Do we even need them? This PR adds way too many files imo
  • Can you share any results you got?

@Muennighoff

Author evaluated on Python and C++, but for now, we are only evaluating on Python, since C++ data was not available. I am creating the dataset for C++ too, as its done will push into the hub