ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.87k stars 9.3k forks source link

[IDEA] Global token enhancement/depression #1865

Open elephantpanda opened 1 year ago

elephantpanda commented 1 year ago

This idea is inspired by Stable Diffusion prompts and anti-prompts. It could be useful to keep the text generation on topic even for small window sizes, for example. (e.g. if creating a poem about cheese and it wanders off on a tangent, still the word "cheese" will have high probability)

The idea is simple. In the output of some text you may want to increase the probabilities of some words while decreasing the probabilities (or set to zero) of other words, globally.

An example of words you may want to depress are swear words etc. Example of words you may want to increase are words relevant to your topic or words in your style.

These global enhancements/depressions of the probabilities would stay constant throughout the text-generation even if the window-size is small.

There are two ways this could work

  1. The user includes a list of words and anti-words.
  2. A model could automatically be trained to create a global-enhancement matrix from the original prompt which stays constant even when the window moves.

There is a slight problem in that words are broken up into tokens, so there might have to be some backtracking to avoid/enhance certain words.

The extra calculation and memory is minimal as it is simply a list of numbers, one for each token that stays constant. The probabilities would be calculated like this if it just worked on tokens:

p'(n) = (p(n)e(n))/ sum( p(i)e(i) )

where p(n) are the original probabilities at a particular step, and e(n) are the enhancement values. I'm not sure the calculation to make it work on words made of 2 or more tokens.

Thoughts?

ggerganov commented 1 year ago

Interesting question. I am also curious if one can devise a strategy that works for words or multi-token strings, ideally without having to backtrack. I guess some form of rejection sampling can be used, but it is not obvious how to adjust the probs to avoid bias.