bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
710 stars 183 forks source link

StarCoderCommit Prompt #134

Closed awasthiabhijeet closed 10 months ago

awasthiabhijeet commented 10 months ago

Hi @Muennighoff, @loubnabnl,

StarCoder paper describe the commit format as: <commit_before>code<commit_msg>text<commit_after>code<eos>

However, the StarCoderCommit Prompt does not have anything between <commit_before> and <commit_msg> tokens?

In my understanding, both instruction and code are part of the commit message ({inp}) in the current version of StarCoder Prompt. Thus the current version of StarCoder Prompt differs slightly from what was used during pre-training.

Is my understanding correct ?

awasthiabhijeet commented 10 months ago

Sorry, I should be looking at the prompt here

Muennighoff commented 10 months ago

Exactly, you need to look at the fix prompt.

The other part was for synthesize & explain where we tried doing text as there is no code before. Results are pretty bad in that format and not in the paper.