Open ARIELDENG opened 9 months ago
I have the same question. I want to complete the code through perfect code requirements, but the model cannot stop. Can you give me a perfect prompt format?
It's the same as StarCoder, we apply FIM inside each file regardless of the repo structure(the filepath in the beginning of a file is optional) so you can do
<fim_prefix>prefix<fim_suffix>suffix<fim_middle>
thanks for your attention, but the thing is that the output won't stop when I apply this formatting, just like you @xcxhy However, it seems to be following a pattern as shown in the picture above, so you can fix it by @xcxhy
yes <file_sep>
is the token we use to separate files so you can use it as a stop token. The <|endoftext|>
token was used to separate repositories since we now concatenate files from the same repo in one sample.
yes
<file_sep>
is the token we use to separate files so you can use it as a stop token. The<|endoftext|>
token was used to separate repositories since we now concatenate files from the same repo in one sample.
Thank you so much, and the StarCoder series are really amazing! Recently I've been using them for SFT to better apply to our users' habits and witnessed great improvement.
When using ollama, all you need to do is set <file_sep>
as stop sequence, when using https://github.com/huggingface/llm-vscode it would be:
"llm.requestBody": {
"stream": true,
"options": {
"stop": [
"<file_sep>"
],
"temperature": 0,
}
},
starcoder's format for inference in code completion is PSM, + prefix + + suffix +
what's that for starcoder2?
from the paper, we could only see that