Need some context for certain args for Instruct Human Eval

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

781 stars 208 forks source link

Open teknium1 opened 2 months ago

teknium1 commented 2 months ago

Hey all, what is the n_samples for instruct human eval about?

The docs say 200 as if its a static setting that should be used, but I can't understand why.

Also, when the turns format is structured, does this look right for chatml?

without quoting each string it gave an error so I assume this is how to use this arg?

teknium1 commented 2 months ago

I ran the instruct humaneval benchmark and the eval results.json shows this:

"eos": "<|endoftext|>",

whereas the actual EOS should be <|im_end|> - don't really see why this is there

Muennighoff commented 2 months ago