bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

AATK process_results is missing #197

Open adiprasad opened 4 months ago

adiprasad commented 4 months ago

https://github.com/bigcode-project/bigcode-evaluation-harness/blob/astraios/bigcode_eval/tasks/aatk.py#L130-L131

rpand002 commented 2 months ago

@loubnabnl @RaymondLi0: Can you provide some pointers on this? I see that you have used this evaluation benchmark in the Starcoder2 paper. Thanks!

terryyz commented 2 months ago

Hi,

Just put the answer here so that more people can see.

There is an evaluation repo provided by the authors of the benchmark: https://github.com/moyix/AsleepKeyboardDataset

We mainly rely on the evaluation scripts provided by them.

Cheers

mayank31398 commented 2 months ago

@terryyz can you share the exact command used to run the eval for AATK?

terryyz commented 2 months ago

@terryyz can you share the exact command used to run the eval for AATK?

I just created a repo used for SC2 eval :slightly_smiling_face: https://github.com/terryyz/AsleepKeyboardDataset Please note that you may want to change stop_words when evaluating other models.