deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.94k stars 1.93k forks source link

feat: Use `tokenizer.apply_chat_template` in HuggingFace Invocation Layer #5919

Closed sjrl closed 9 months ago

sjrl commented 1 year ago

Feature Request Transformers recently added a new feature called

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

which auto-applies the right formatting around the messages for models.

Using this could greatly improve the user experience of open-source LLMs since users would no longer have to manually add the correct tokens and formatting to prompts sent to the PromptNode.

Here is a full example of how to use this new function with the Mistral Instruct Model

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Alternative Rely on users to manually add the correct tokens to the prompt sent to the PromptNode

vblagoje commented 1 year ago

Great find @sjrl , we need this for 1.x and 2.0 as well.

sjrl commented 1 year ago

Just as a heads up it looks like this feature might only be in main (at least according to their current docs https://huggingface.co/docs/transformers/main/chat_templating#templates-for-chat-models) so we might need to wait on this.

sebastianschramm commented 1 year ago

I am interested in using that feature with PromptNode. It is now released with v4.34.0 (https://huggingface.co/docs/transformers/v4.34.0/en/chat_templating). What is the recommended way of using it?

Any updates on this @sjrl @vblagoje?

hasanradi93 commented 1 year ago

Can you give me same example but for CPU ?