In the paper, I noticed that the token used is <|fim_start|>, where it is important to note that the | character is not the ASCII |, the underscore _ is an ASCII character. However, in the GitHub repo readme.md, the underscore is represented by ▁, as seen in <|fim▁begin|>. During my experimentation with Ollama, the use of ▁ resulted in encoding errors.
In the paper, I noticed that the token used is
<|fim_start|>
, where it is important to note that the|
character is not the ASCII|
, the underscore_
is an ASCII character. However, in the GitHub repo readme.md, the underscore is represented by▁
, as seen in<|fim▁begin|>
. During my experimentation with Ollama, the use of▁
resulted in encoding errors.