[IDEA] Global token enhancement/depression

This idea is inspired by Stable Diffusion prompts and anti-prompts. It could be useful to keep the text generation on topic even for small window sizes, for example. (e.g. if creating a poem about cheese and it wanders off on a tangent, still the word "cheese" will have high probability)

The idea is simple. In the output of some text you may want to increase the probabilities of some words while decreasing the probabilities (or set to zero) of other words, globally.

An example of words you may want to depress are swear words etc. Example of words you may want to increase are words relevant to your topic or words in your style.

These global enhancements/depressions of the probabilities would stay constant throughout the text-generation even if the window-size is small.

There are two ways this could work

The user includes a list of words and anti-words.
A model could automatically be trained to create a global-enhancement matrix from the original prompt which stays constant even when the window moves.

There is a slight problem in that words are broken up into tokens, so there might have to be some backtracking to avoid/enhance certain words.

The extra calculation and memory is minimal as it is simply a list of numbers, one for each token that stays constant. The probabilities would be calculated like this if it just worked on tokens:

p'(n) = (p(n)e(n))/ sum( p(i)e(i) )

where p(n) are the original probabilities at a particular step, and e(n) are the enhancement values. I'm not sure the calculation to make it work on words made of 2 or more tokens.

Thoughts?

ggerganov / llama.cpp

[IDEA] Global token enhancement/depression #1865