StarCoder paper describe the commit format as:
<commit_before>code<commit_msg>text<commit_after>code<eos>
However, the StarCoderCommit Prompt does not have anything between <commit_before> and <commit_msg> tokens?
In my understanding, both instruction and code are part of the commit message ({inp}) in the current version of StarCoder Prompt.
Thus the current version of StarCoder Prompt differs slightly from what was used during pre-training.
The other part was for synthesize & explain where we tried doing text as there is no code before. Results are pretty bad in that format and not in the paper.
Hi @Muennighoff, @loubnabnl,
StarCoder paper describe the commit format as:
<commit_before>code<commit_msg>text<commit_after>code<eos>
However, the StarCoderCommit Prompt does not have anything between
<commit_before>
and<commit_msg>
tokens?In my understanding, both instruction and code are part of the commit message (
{inp}
) in the current version of StarCoder Prompt. Thus the current version of StarCoder Prompt differs slightly from what was used during pre-training.Is my understanding correct ?