abdeladim-s / pyllamacpp

Python bindings for llama.cpp
https://abdeladim-s.github.io/pyllamacpp/
MIT License
62 stars 21 forks source link

Bos token will always be added to suffix #14

Open vvasily opened 1 year ago

vvasily commented 1 year ago

https://github.com/abdeladim-s/pyllamacpp/blob/6d487b904b93c48862cc1d8b29c7f3466ca6f6a5/pyllamacpp/model.py#LL111C92-L111C92

pp.llama_tokenize with True param will add bos token to the string in suffix. After in line 202 input_tokens = self._prompt_prefix_tokens + pp.llama_tokenize(self._ctx, prompt, True) + self._prompt_suffix_tokens the input prompt will be [BOS]<promt>[BOS]<suffix>. In the case if bos token is not "", I think it will be improper.

abdeladim-s commented 1 year ago

Yes true, thanks @vvasily for pointing that out. I will fix it quickly.