bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
771 stars 201 forks source link

Add CodeXGLUE-code-refinement (few-shot) setting #13

Open manandey opened 1 year ago

manandey commented 1 year ago

4 (cc. @loubnabnl)

loubnabnl commented 1 year ago

Hi @manandey is this a WIP? We still need to add the task to the code (this could wait as we are making some changes to the codebase this week) Also can you provide some information on how you made the few shot examples?

manandey commented 1 year ago

Hi @loubnabnl, sorry for not making any progress with this PR for a long time. I will try to close this within a day or two at the latest.

manandey commented 1 year ago

@loubnabnl I have completed adding the task. It would be great if you can have a look at it once and suggest if any changes are required. The few-shot examples are added from the dataset's train split. Thanks!

On executing: python -m main --tasks codexglue_code_refinement-small --limit 5 --model Salesforce/codegen-350M-mono --n_samples 10 --batch_size 1 , the output was:

{
  "codexglue_code_refinement-small": 0.6492825221566064,
  "config": {
    "model": "Salesforce/codegen-350M-mono"
  }
}
loubnabnl commented 1 year ago

Thanks for working on this Manan! I will take a look

manandey commented 1 year ago

Hi @loubnabnl, I have updated the prompts as per your suggestions. It is currently working on both CPU and GPU. So, feel free to merge if it looks good. Also, regarding evaluating it using execution based metric, it would have been great if you could guide me a bit on how to approach. I will open a separate PR for the same. Thanks!