Open congson1293 opened 5 months ago
@abetlen @congson1293
I checked the parameters in both call and create_completion method but not see penalty_alpha param which represent for contrastive search decoding. Can you update the decoding strategy soon?
As I understand it, the frequency_penalty
and presence_penalty
are what is referred to as the alpha value in the Contrastive Search paper. See these lines from the llama.cpp README:
`presence_penalty`: Repeat alpha presence penalty (default: 0.0, 0.0 = disabled).
`frequency_penalty`: Repeat alpha frequency penalty (default: 0.0, 0.0 = disabled);
If I'm not mistaken, presence_penalty
is what you're looking for, but I may be misunderstanding something...
@ddh0
My understanding is that contrastive search decoding just does this at each step:
In HF's implementation, penalty_alpha
just controls how much weight it has. 0.0
would just be normal greedy search, 1.0
would be just using contrastive decoding, 0.6
is the default and it's a mixture of the two.
What's crazy is that this is, like, one of the very few, very light decoding techniques that can make a 7b param model behave like a 30b param one. There's a blog summary here but personally I didn't find it to be very helpful. It's definitely worthwhile implementing.
Hi @abetlen, I checked the parameters in both
__call__
andcreate_completion
method but did not seepenalty_alpha
param which represents contrastive search decoding. Can you update the decoding strategy soon @abetlen ?