abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.59k stars 909 forks source link

Does this lib support contrastive search decoding ? #1253

Open congson1293 opened 5 months ago

congson1293 commented 5 months ago

Hi @abetlen, I checked the parameters in both __call__ and create_completion method but did not see penalty_alpha param which represents contrastive search decoding. Can you update the decoding strategy soon @abetlen ?

ddh0 commented 5 months ago

@abetlen @congson1293

I checked the parameters in both call and create_completion method but not see penalty_alpha param which represent for contrastive search decoding. Can you update the decoding strategy soon?

As I understand it, the frequency_penalty and presence_penalty are what is referred to as the alpha value in the Contrastive Search paper. See these lines from the llama.cpp README:

    `presence_penalty`: Repeat alpha presence penalty (default: 0.0, 0.0 = disabled).

    `frequency_penalty`: Repeat alpha frequency penalty (default: 0.0, 0.0 = disabled);

If I'm not mistaken, presence_penalty is what you're looking for, but I may be misunderstanding something...

ckoshka commented 3 months ago

@ddh0

My understanding is that contrastive search decoding just does this at each step:

  1. Take all the tokens in the input and mean_pool their embeddings
  2. Look at the prospective next tokens and see which one has the highest cosine similarity to the mean
  3. Choose that token

In HF's implementation, penalty_alpha just controls how much weight it has. 0.0 would just be normal greedy search, 1.0 would be just using contrastive decoding, 0.6 is the default and it's a mixture of the two.

What's crazy is that this is, like, one of the very few, very light decoding techniques that can make a 7b param model behave like a 30b param one. There's a blog summary here but personally I didn't find it to be very helpful. It's definitely worthwhile implementing.