feat: Use `tokenizer.apply_chat_template` in HuggingFace Invocation Layer

sjrl commented 1 year ago

Feature Request Transformers recently added a new feature called

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

which auto-applies the right formatting around the messages for models.

Using this could greatly improve the user experience of open-source LLMs since users would no longer have to manually add the correct tokens and formatting to prompts sent to the PromptNode.

Here is a full example of how to use this new function with the Mistral Instruct Model

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Alternative Rely on users to manually add the correct tokens to the prompt sent to the PromptNode

vblagoje commented 1 year ago

Great find @sjrl , we need this for 1.x and 2.0 as well.

sjrl commented 1 year ago

Just as a heads up it looks like this feature might only be in main (at least according to their current docs https://huggingface.co/docs/transformers/main/chat_templating#templates-for-chat-models) so we might need to wait on this.

sebastianschramm commented 1 year ago

I am interested in using that feature with PromptNode. It is now released with v4.34.0 (https://huggingface.co/docs/transformers/v4.34.0/en/chat_templating). What is the recommended way of using it?

Any updates on this @sjrl @vblagoje?

hasanradi93 commented 1 year ago

Can you give me same example but for CPU ?

deepset-ai / haystack

feat: Use `tokenizer.apply_chat_template` in HuggingFace Invocation Layer #5919