character-ai / prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
MIT License
922 stars 76 forks source link
llm llm-inference prompt prompt-design prompt-engineering prompt-tuning prompting


Prompt Poet

Prompt Poet streamlines and simplifies prompt design for both developers and non-technical users with its low code approach. Using a mix of YAML and Jinja2, Prompt Poet allows for flexible, dynamic prompt creation, enhancing the efficiency and quality of interactions with AI models. It saves time on engineering string manipulations, enabling everyone to focus more on crafting the optimal prompts for their users.


pip install prompt-poet

Basic Usage

import os
import getpass

from prompt_poet import Prompt
from langchain import ChatOpenAI

# Uncomment if you need to set OPENAI_API_KEY.
# os.environ["OPENAI_API_KEY"] = getpass.getpass()

raw_template = """
- name: system instructions
  role: system
  content: |
    Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.

- name: user query
  role: user
  content: |
   {{ username}}: {{ user_query }}

- name: response
  role: user
  content: |
    {{ character_name }}:

template_data = {
  "character_name": "Character Assistant",
  "username": "Jeff",
  "user_query": "Can you help me with my homework?"

prompt = Prompt(

model = ChatOpenAI(model="gpt-4o-mini")
response = model.invoke(prompt.messages)

Prompt Templates

Prompt Poet templates use a mix of YAML and Jinja2. Template processing occurs in two primary stages:

Example: Basic Q&A Bot

- name: system instructions
  role: system
  content: |
    Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.

- name: user query
  role: user
  content: |
   {{ username}}: {{ user_query }}

- name: reply_prompt
  role: user
  content: |
    {{ character_name }}:

Interpolating Lists

If you have elements (e.g. messages) in a list you can parse them into your template like so.

{% for message in current_chat_messages %}
- name: chat_message
  role: user
  content: |
    {{ }}: {{ message.content }}
{% endfor %}

Truncating Old Messages

Context length is limited and can’t always fit the entire chat history– so we can set a truncation priority on the message parts and Prompt Poet will truncate these parts in the order in which they appear (oldest to newest).

{% for message in current_chat_messages %}
- name: chat_message
  role: user
  truncation_priority: 1
  content: |
    {{ }}: {{ message.content }}
{% endfor %}

Adapting to User Modality

To tailor instructions based on the user's current modality (audio or text).

{% if modality == "audio" %}
- name: special audio instruction
  role: system
  content: |
    {{ username }} is currently using audio. Keep your answers succinct.
{% endif %}

Targeting Specific Queries

To include context-specific examples like homework help when needed.

{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_{{ loop.index }}
  role: user
  content: |
    {{ homework_example }}
{% endfor %}
{% endif %}

Handling Whitespace

Prompt Poet will strip whitespace by default to avoid unwanted newlines in your final prompt. If you want to include an explicit space use the special built-in space marker “<|space|>” to ensure proper formatting.

- name: system instructions
  role: system
  content: |
    Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.

- name: user query
  role: user
  content: |
   <|space|>{{ username}}: {{ user_query }}

Putting It All Together

Compositionality is a core strength of Prompt Poet templates, enabling the creation of complex, dynamic prompts.

- name: system instructions
  role: system
  content: |
    Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.

{% if modality == "audio" %}
- name: special audio instruction
  role: system
  content: |
    {{ username }} is currently using audio modality. Keep your answers succinct and to the point.
{% endif %}

{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_{{ loop.index }}
  role: user
  content: |
    {{ homework_example }}
{% endfor %}
{% endif %}

{% for message in current_chat_messages %}
- name: chat_message
  role: user
  truncation_priority: 1
  content: |
    {{ }}: {{ message.content }}
{% endfor %}

- name: user query
  role: user
  content: |
   {{ username}}: {{ user_query }}

- name: reply_prompt
  role: user
  content: |
    {{ character_name }}:

Decomposing Into Sections

To maintain DRY principles in your templates, break them down into reusable sections that can be applied across different templates, such as when A/B testing a new prompt.

{% include 'sections/system_instruction.yml.j2' %}

{% include 'sections/audio_instruction.yml.j2' %}

{% if extract_user_query_topic(user_query) == "homework_help" %}
{% include 'sections/homework_examples.yml.j2' %}
{% endif %}

{% include 'sections/chat_messages.yml.j2' %}

{% include 'sections/user_query.yml.j2' %}

{% include 'sections/reply_prompt.yml.j2' %}

Design Choices

Prompt Poet Library

The Prompt Poet Library provides various features and settings, including prompt properties. Key features like tokenization and truncation help with efficient caching and low latency responses

prompt.truncate(token_limit=TOKEN_LIMIT, truncation_step=TRUNCATION_STEP)

# Inspect prompt as a raw string.
prompt.string: str
>>> "..."

# Inpsect the prompt as raw tokens.
prompt.tokens: list[int]
>>> [...]

# Inspect the prompt as LLM API message dicts.
prompt.messages: list[dict]
>>> [...]

# Inspect the prompt as first class parts. list[PromptPart]
>>> [...]

Templating Language

Jinja2 and YAML combine to offer an incredibly extensible and expressive templating language. Jinja2 facilitates direct data bindings, arbitrary function calls, and basic control flow within templates. YAML provides structure to our templates (with depth=1) allowing us to perform sophisticated truncation when the token limit is reached. This pairing of Jinja2 and YAML is not unique – most notably it is used by Ansible.

Template-native Function Calling

One standout feature of Jinja2 is the ability to invoke arbitrary Python functions directly within templates at runtime. This feature is crucial for on-the-fly data retrieval, manipulation, and validation, streamlining how prompts are constructed. Here extract_user_query_topic can perform arbitrary processing of the user's query used in the template's control flow--perhaps by performing a round-trip to a topic classifier.

{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_{{ loop.index }}
  role: user
  content: |
    {{ homework_example }}
{% endfor %}
{% endif %}

Custom Encoding Function

By default Prompt Poet will use the TikToken “o200k_base” tokenizer although alternate encoding names may be provided in the top-level tiktoken_encoding_name. Alternatively, users can provide their own encode function with the top-level encode_func: Callable[[str], list[int]].

from tiktoken import get_encoding
encode_func = get_encoding("o200k_base")

prompt = Prompt(
>>> [...]


If your LLM provider supports GPU affinity and prefix cache, utilize Character.AI’s truncation algorithm to maximize the prefix-cache rate. The prefix cache rate is defined as the number of prompt tokens retrieved from cache over the total number of prompt tokens. Find the optimal values for truncation step and token limit for your use case. As the truncation step increases, the prefix cache rate also rises, but more tokens are truncated from the prompt.

TOKEN_LIMIT = 128000

# Tokenize and truncate the prompt.
prompt.truncate(token_limit=TOKEN_LIMIT, truncation_step=TRUNCATION_STEP)

response = model.invoke(prompt.messages)

Cache-aware Truncation Explained

In short, Cache Aware Truncation truncates up to a fixed truncation point every time it is invoked–only moving this truncation point on average every k turns. This allows your LLM provider to maximally exploit GPU prefix cache described in Optimizing Inference. If instead we simply truncated until reaching the token limit (L) this truncation point would move every turn which would cause a significant reduction in prefix cache rate. The tradeoff in this approach is that we often truncate more than we strictly need to.

Cache-aware Truncation

Template Registry

A Template Registry is simply the concept of storing templates as files on disk. In using a Template Registry you can isolate template files from your python code and load these files directly from disk. In production systems, these template files can optionally be loaded from an in-memory cache on successive uses, saving on disk I/O. In the future a Template Registry may become a first-class citizen of Prompt Poet.

Filename: chat_template.yml.j2

- name: system instructions
  role: system
  content: |
    Your name is {{ character_name }} and you are meant to be helpful and never harmful to humans.

- name: user query
  role: user
  content: |
   {{ username}}: {{ user_query }}

- name: response
  role: user
  content: |
    {{ character_name }}:

Run this python code from the same directory you have saved the file chat_template.yml.j2 to.

from prompt_poet import Prompt

prompt = Prompt(
>>> 'Your name is Character Assistant and you are meant to be helpful and never harmful to humans.Jeff: Can you help me with my homework?Character Assistant:'

Related Work