eli64s / readme-ai

README file generator, powered by large language model APIs 👾
MIT License
1.49k stars 160 forks source link

TikToken Special Character Conflict #88

Open jamesvillarrubia opened 8 months ago

jamesvillarrubia commented 8 months ago

When running in basic JS application, I'm getting this error:

ERROR    [1:logger] [2024-01-29 15:51:28,449] Error in token encoding: Encountered text corresponding to disallowed special token '<|endoftext|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endoftext|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.

Related to: https://github.com/langchain-ai/langchain/issues/923