bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
781 stars 208 forks source link

Need some context for certain args for Instruct Human Eval #256

Open teknium1 opened 2 months ago

teknium1 commented 2 months ago

Hey all, what is the n_samples for instruct human eval about?

The docs say 200 as if its a static setting that should be used, but I can't understand why.

Also, when the turns format is structured, does this look right for chatml?

--instruction_tokens "<|im_start|>user\n","<|im_end|>\n","<|im_start|>assistant\n"

without quoting each string it gave an error so I assume this is how to use this arg?

teknium1 commented 2 months ago

I ran the instruct humaneval benchmark and the eval results.json shows this:

"eos": "<|endoftext|>",

whereas the actual EOS should be <|im_end|> - don't really see why this is there

Muennighoff commented 2 months ago

I recommend just running humanevalsynthesize from humanevalpack which offers the same + more - instructions for running are here: https://github.com/bigcode-project/octopack?tab=readme-ov-file#run & here https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md#humanevalpack

you may just need to add your instruction format here: https://github.com/bigcode-project/bigcode-evaluation-harness/blob/0f3e95f0806e78a4f432056cdb1be93604a51d69/bigcode_eval/tasks/humanevalpack.py#L235