Implement Dynamic Prompt Resizing for LLM Calls

Description: A user may have an LLM pipeline with a prompt template that gets very large when the placeholders are filled in. We want to support dynamic resizing of prompts based on a token budget, by prioritizing certain elements within the prompt template. The question is: what do we select in the prompt template?

Objective: To ensure that the generated prompt does not exceed a pre-determined or runtime-specified token budget by selectively including elements based on their importance and constraints.

Functional Requirements:

State Management:
- Motion component state is a dictionary, where entries (key-value pair) corresponds to placeholders in a prompt template. For instance, in a recommender system Motion component, state might include user information (e.g., name, age, location), recent activities, interests, friends). Serve operations might consist of a call to an LLM with some prompt template, and update (background) operations may add information to the state (e.g., activities).
- We should implement a special type of component, say an LLM component, that has utilities for the requirements below.
Token Budgeting and Optimization:
- If the token count of a formatted prompt template exceeds the budget, apply a strategy to reduce tokens while maximizing content importance.
- Importance is defined by user-defined constraints (see the "constraints management" section below) and log-probs of an LLM.
- We break down the prompt into chunks, where each chunk is an entry, or if the entry is a list-type entry, a chunk is an item in the list. Each chunk has a cost (i.e., # tokens). Each chunk also has an importance (derived from log-probs of the chunk and user-defined importances).
- Our optimizer should select chunks such that the sum of the importances of chunks selected is maximized, while the cost of the selected chunks falls under the token budget. More formally:

Let $S = {s_1, s_2, \ldots, s_n}$ represent the set of all chunks in the prompt template. Define $T(s_i)$ as the token count for the full inclusion of chunk $s_i$. Let $B$ denote the total token budget. Each chunk $s_i$ is assigned an importance $I(s_i)$, a combination of the log probs and user-defined constraints.

The optimization problem is to determine a set of inclusion indicators (i.e., 0 or 1) ${x_1, x_2, \ldots, x_n}$ for each state entry $s_i$ such that:

The total token count of the included fractions does not exceed budget $B$: $$\sum_{i=1}^{n} x_i \cdot T(s_i) \leq B $$
The sum of coverages, weighted by importance and inclusion fraction, is maximized: $$\text{Maximize} \quad \sum_{i=1}^{n} x_i \cdot I(s_i)$$

This is at least NP-hard, via a simple reduction to maximum coverage.

Constraints Management:
- Users should be able to set constraints on entries, including:
  - "Must-Include": Certain entries are mandatory in the prompt (e.g., user name, location).
  - "List Direction": For list-type entries, provide options to prioritize newer or older items. Users can set a "direction" constraint: right means the newer items are prioritized, left means the older items are prioritized.
  - "Priority Assignment": Enable users to assign (positive) priorities for each state entry, e.g., user's recent activities is more important than friends' activities. A topological sort order is derived from the priorities, and entries without priorities are assumed to be priority 1.
Importance Scores

Now, we should determine how to compute $I(s_i)$, given both log-probs ($p_i$) and user-defined constraints. A low $p_i$ means $s_i$ is unexpected, so we may want to prioritize $s_i$? For list direction constraints, we can apply an exponential weighting scheme to derive weights for each list item (and normalize the weights within the list). More concretely, for the $j$th most important element in a list, its weight is $\exp(j * \lambda)$, where $\lambda$ is some constant that determines the rate of growth.

Let $E$ be the set of all entries (i.e., variables in the state/prompt template). Let $\phi: S \rightarrow E$ be the function that maps a chunk to its entry. The importance score $I(s_i)$ for each chunk $s_i$ is defined as:

$$I(s_i) = \begin{cases} \lambda_j(\phi(s_i)) \cdot \exp(-\log p_i) \cdot P(\phi(s_i)), & \text{if } s_i \text{ is part of a list-type entry} \ \exp(-\log p_i) \cdot P(\phi(s_i)), & \text{otherwise} \end{cases}$$

Where:

$\phi(s_i)$ maps the chunk $s_i$ to its corresponding entry $e_j$.
$\lambda_j(\phi(s_i))$ is the exponential weight for the chunk $s_i$ when it is part of a list. For the $j$th most important element in a list, it is defined as $\exp(j \cdot \lambda$, where $\lambda$ is a constant rate of growth.
$\exp(-\log p_i)$ inversely converts the log probability into a factor that increases the importance of unexpected chunks.
$P(\phi(s_i))$ is the user-assigned priority for the entry corresponding to chunk $s_i$, with a default value of 1 if not specified by the user.

Acceptance Criteria:

Write a general-purpose LLM component, which is a subclass of Component with utilities to set constraints and budgets.
During ops that involve calls to LLMs, motion dynamically adjusts the prompt content to respect the token budget, which can be set at runtime. Compute importances for each chunk.

dm4ml / motion

Implement Dynamic Prompt Resizing for LLM Calls #286