bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
697 stars 176 forks source link

Using custom prompts and postprocessing #245

Open anil-gurbuz opened 2 weeks ago

anil-gurbuz commented 2 weeks ago

Hi,

We want to make a submission for the leaderboard with our fine-tuned model used in www.jdoodle.com

It seems like for some of the humaneval questions, our model getting the logic right but we are having issues due to the format of the output which could be potentially fixed by postprocessing and using a slightly different prompting -- our fine-tuning process was using a different prompt template then instruction prompting--

I was wondering would it be a problem if we use custom postprocessing steps and/or prompt template for generating responses? Would we be eligible to have a place in the leaderboard in that case?

Thanks!

loubnabnl commented 1 week ago

Hi, you can use HumanEvalSynthesize for Python and edit the prompt similar to here https://github.com/bigcode-project/bigcode-evaluation-harness/pull/219/files I think the post-processing should work by default, what other changes did you want to introduce? For the other languages we're using plain MultiPL-E prompts in the leaderboard even for the chat models, you can add another stop token here if that's what's messing up the post-processing