Open adiprasad opened 9 months ago
@loubnabnl @RaymondLi0: Can you provide some pointers on this? I see that you have used this evaluation benchmark in the Starcoder2 paper. Thanks!
Hi,
Just put the answer here so that more people can see.
There is an evaluation repo provided by the authors of the benchmark: https://github.com/moyix/AsleepKeyboardDataset
We mainly rely on the evaluation scripts provided by them.
Cheers
@terryyz can you share the exact command used to run the eval for AATK?
@terryyz can you share the exact command used to run the eval for AATK?
I just created a repo used for SC2 eval :slightly_smiling_face: https://github.com/terryyz/AsleepKeyboardDataset Please note that you may want to change stop_words when evaluating other models.
https://github.com/bigcode-project/bigcode-evaluation-harness/blob/astraios/bigcode_eval/tasks/aatk.py#L130-L131