Closed Gnurro closed 10 months ago
As a note: The issue occurs with the Llama2-chat tokenizer - it does not with the ~10 other tokenizers I've tested this with, hence the oversight. Changed it for the HF backend as well to prevent this occuring if a model/tokenizer added in the future has the same issue.
Fix to properly cull prompt from model output. The old code uses
tokenizer.apply_chat_template(messages, tokenize=False
to get the prompt with chat formatting as string - but this method call does not return the actual decoded result, but one that is missing whitespaces, thus preventing string match for replacement. Decoding the actual tokenized prompt instead solves this issue.