chujiezheng / chat_templates

Chat Templates for 🤗 HuggingFace Large Language Models
MIT License
429 stars 40 forks source link

chat_templates

This is a repository that includes proper chat templates (or input formats) for instruction-tuned large language models (LLMs), to support transformers's chat_template feature. If you are interested to include more chat templates, feel free to open a pull request

If you find this repo useful, please kindly cite it:

@misc{zheng-2024-chat-templates,
  author = {Zheng, Chujie},
  title = {Chat Templates for HuggingFace Large Language Models},
  year = {2024},
  howpublished = {\url{https://github.com/chujiezheng/chat_templates}}
}

Updates

What are Contained in This Repo?

Usage Examples

Important NOTE: As mentioned in this issue, the messages should contain at least one user message. It is strongly not recommented to pass only the system message, as there may result in unexpected outputs (because the models are not trained in this way).

Example 1: Meta-Llama-3-8B-Instruct This example may check if the jinja file is correctly implemented. ```python from transformers import AutoTokenizer toker = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", token="YOUR_OWN_TOKEN") messages = [ {'role': 'system', 'content': 'This is a system prompt.'}, {'role': 'user', 'content': 'This is the first user input.'}, {'role': 'assistant', 'content': 'This is the first assistant response.'}, {'role': 'user', 'content': 'This is the second user input.'}, ] print('###### Default (yet Correct) Chat Template ######') print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) print('###### Corrected Chat Template ######') chat_template = open('./chat_templates/llama-3-instruct.jinja').read() chat_template = chat_template.replace(' ', '').replace('\n', '') toker.chat_template = chat_template print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) ``` Expected output: ``` ###### Default (yet Correct) Chat Template ###### <|begin_of_text|><|start_header_id|>system<|end_header_id|> This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|> This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|> This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|> This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|> ###### Corrected Chat Template ###### <|begin_of_text|><|start_header_id|>system<|end_header_id|> This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|> This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|> This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|> This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|> ```
Example 2: Mistral-7B-Instruct-v0.2 For `mistral-instruct` (also `gemma-it`), it does not natively support the `system` message, so passing the `system` message would raise error. ```python from transformers import AutoTokenizer toker = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") messages = [ {'role': 'system', 'content': 'This is a system prompt.'}, {'role': 'user', 'content': 'This is the first user input.'}, {'role': 'assistant', 'content': 'This is the first assistant response.'}, {'role': 'user', 'content': 'This is the second user input.'}, ] print('###### Default (but Improper) Chat Template ######') # raising error #print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) print('###### Corrected Chat Template ######') chat_template = open('./chat_templates/mistral-instruct.jinja').read() chat_template = chat_template.replace(' ', '').replace('\n', '') toker.chat_template = chat_template print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) ``` Expected output: ``` ###### Default (but Error-Raising) Chat Template ###### jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/... ###### Corrected Chat Template ###### [INST] This is a system prompt. This is the first user input. [/INST] This is the first assistant response. [INST] This is the second user input. [/INST] ```
Example 3: vicuna-7b-v1.5 **NOTE:** In [fast-chat](https://github.com/lm-sys/FastChat/blob/d578599c69d060e6d40943f1b5b72af98956092a/fastchat/conversation.py#L287C3-L287C3), `vicuna` does not add linebreaks between roles' messages. But I found that adding linebreaks leads to a bit better performance (especially for the v1.5 version). Also, I found `vicuna-7/13/33b-v1.3` may not work well when given a system message different from its default one. So I would recommend to use `vicuna-7/13b-v1.5` instead. ```python from transformers import AutoTokenizer toker = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5") messages = [ {'role': 'system', 'content': 'This is a system prompt.'}, {'role': 'user', 'content': 'This is the first user input.'}, {'role': 'assistant', 'content': 'This is the first assistant response.'}, {'role': 'user', 'content': 'This is the second user input.'}, ] print('###### Default (but Improper) Chat Template ######') print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) print('###### Corrected Chat Template ######') chat_template = open('./chat_templates/vicuna.jinja').read() chat_template = chat_template.replace(' ', '').replace('\n', '') toker.chat_template = chat_template print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)) ``` Expected output: ``` ###### Default (but Improper) Chat Template ###### [INST] <> This is a system prompt. <> This is the first user input. [/INST] This is the first assistant response. [INST] This is the second user input. [/INST] ###### Corrected Chat Template ###### This is a system prompt. USER: This is the first user input. ASSISTANT: This is the first assistant response. USER: This is the second user input. ASSISTANT: ```

Supported Models

NOTE: The listed models are not inclusive and also include other-sized ones in the same model family

Llama-3-Instruct, Llama-3.1-Instruct - Models: `meta-llama/Meta-Llama-3-8B-Instruct`, `meta-llama/Meta-Llama-3.1-8B-Instruct` - Chat template: `chat_templates/llama-3-instruct.jinja` - Generation config: `generation_configs/llama-3-instruct.json` - Reference: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json#L2053
Llama-2-Chat, CodeLlama-Instruct - Models: `meta-llama/Llama-2-7b-chat-hf`, `meta-llama/CodeLlama-7b-Instruct-hf` - Chat template: `chat_templates/llama-2-chat.jinja` - Generation config: `generation_configs/llama-2-chat.json` - Reference: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json#L12
Qwen2-Instruct, Qwen1.5-Chat - Models: `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen1.5-7B-Chat` - Chat template: `chat_templates/chatml.jinja` - Generation config: `generation_configs/qwen2-instruct.json` - Reference: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/tokenizer_config.json#L31
Mistral-Instruct - Models: `mistralai/Mistral-7B-Instruct-v0.3`, `mistralai/Mixtral-8x7B-Instruct-v0.1` - Chat template: `chat_templates/mistral-instruct.jinja` - Generation config: `generation_configs/mistral-instruct.json` - Reference: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/tokenizer_config.json#L42 - Comment: **System message is acceptable** by prepending it before the first user input
Phi-3-Instruct - Models: `microsoft/Phi-3-mini-4k-instruct` - Chat template: `chat_templates/phi-3.jinja` - Generation config: `generation_configs/phi-3.json` - Reference: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer_config.json#L338
Yi-1.5-Chat, Yi-Chat - Models: `01-ai/Yi-1.5-6B-Chat`, `01-ai/Yi-6B-Chat` - Chat template: `chat_templates/chatml.jinja` - Generation config: `generation_configs/yi-chat.json` - Reference: https://huggingface.co/01-ai/Yi-6B-Chat/blob/main/tokenizer_config.json#L60
gemma-it, gemma-2-it - Models: `google/gemma-7b-it`, `google/gemma-2-9b-it` - Chat template: `chat_templates/gemma-it.jinja` - Generation config: `generation_configs/gemma-it.json` - Reference: https://huggingface.co/google/gemma-7b-it/blob/main/tokenizer_config.json#L1507 - Comment: **System message is acceptable**
Llama3-ChatQA-1.5 - Models: `nvidia/Llama3-ChatQA-1.5-8B` - Chat template: `chat_templates/chatqa.jinja` - Generation config: `generation_configs/chatqa.json` - Reference: https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B#when-context-is-available - Comment: Context message is acceptable
openchat-3.5, Starling-LM - Models: `openchat/openchat_3.5`, `berkeley-nest/Starling-LM-7B-alpha` - Chat template: `chat_templates/openchat-3.5.jinja` - Generation config: `generation_configs/openchat-3.5.json` - Reference: https://huggingface.co/openchat/openchat_3.5/blob/main/tokenizer_config.json#L51
zephyr - Models: `zephyr-7b-alpha` - Chat template: `chat_templates/zephyr.jinja` - Generation config: `generation_configs/zephyr.json` - Reference: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json#L34
vicuna - Models: `vicuna-7b-v1.5`, `vicuna-7b-v1.3` - Chat template: `chat_templates/vicuna.jinja` - Generation config: `generation_configs/vicuna.json` - Reference: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#prompt-template
Orca-2 - Models: `microsoft/Orca-2-7b` - Chat template: `chat_templates/chatml.jinja` - Generation config: `generation_configs/orca-2.json` - Reference: https://huggingface.co/microsoft/Orca-2-7b
falcon-instruct - Models: `tiiuae/falcon-7b-instruct` - Chat template: `chat_templates/falcon-instruct.jinja` - Reference: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#prompt-template
SOLAR-Instruct - Models: `upstage/SOLAR-10.7B-Instruct-v1.0` - Chat template: `chat_templates/solar-instruct.jinja` - Generation config: `generation_configs/solar-instruct.json` - Reference: https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0/blob/main/tokenizer_config.json#L31
Alpaca - Models: `tatsu-lab/alpaca-7b-wdiff` - Chat template: `chat_templates/alpaca.jinja` - Generation config: `generation_configs/alpaca.json` - Reference: https://github.com/tatsu-lab/stanford_alpaca
AmberChat - Models: `LLM360/AmberChat`, `LLM360/AmberSafe` - Chat template: `chat_templates/amberchat.jinja` - Generation config: `generation_configs/amberchat.json` - Reference: https://huggingface.co/LLM360/AmberChat
saiga - Models: `IlyaGusev/saiga_mistral_7b_lora` - Chat template: `chat_templates/saiga.jinja` - Generation config: `generation_configs/saiga.json` - Reference: https://huggingface.co/IlyaGusev/saiga_mistral_7b_lora#saigamistral-7b-russian-mistral-based-chatbot - Comment: A series of Russian models

Star History

Star History Chart