character-ai / prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
https://pypi.org/project/prompt-poet/
MIT License
897 stars 73 forks source link

Custom function hoooks instead of truncation? #15

Open twardoch opened 2 months ago

twardoch commented 2 months ago

Truncation is one way to solve context overflow. Another is summarization.

(Btw the magic keyword to trigger good summarization from any model is not asking it to do a summary but asking it to do a TLDR. Works much better, you don't need very verbose prompts.)

It would be great if Prompt Poet could allow for a custom hook function (or does it already) that gets triggered if we context-overflow.

In certain situations, I generally prefer to call small local model or a cheap remote model with TLDR to compress older prompts (the chat history for example), rather than truncating.

groeney commented 2 months ago

Agreed this is something we should have--I can take this on next week if there hasn't been an attempt by then. There are a few things here:

  1. Conceptually I'm thinking about this as content compression rather than explicit content truncation which is what we have today.
  2. Therefore I'm thinking through something like Prompt(..., content_compression_func, ...)
  3. For the function signature
    def content_compression_func(input_content: str, max_output_tokens: int, encode_func: Callable[[str], list[int]]) -> str

The user would need to ensure that when encoded, the output string is below the max_output_tokens. I believe we need to quantify max_output_tokens to ensure we end up below the token limit in a context-overflow scenario.


As far as getting this running at scale, which is crucial for us, we need to then do the following.

First we should ensure that the input_content does not change every time we have context-overflow--this can be achieved by the truncation step buffer we have today.

Then, we need to ensure that once a specific input_content has been mapped to a compressed state, that this state does not change. If it does change on successive generation attempts then it will break our model server caching which will break our serving economics.


Another name for this function might be context_overflow_func though I think content_compression_func is a bit more explicit as to the behavior of the function. Not very opinionated here, though.

0xmihutao commented 3 weeks ago

any update here?