Open youkaichao opened 3 months ago
@youkaichao I think it might not be very relevant with popular sampling strategy? Suppose logits is x
, the new probability of token i
is exp(x_i)/(sum(exp(x_j))-sum(exp(x_k)))
where x_j
's are all tokens in a given vocabulary and x_k
's are filtered tokens. While the new probability for improper tokens will increase, the new probability for proper token will increase as well. This means with top-p
sampling, improper tokens will still be filtered out eventually and does not affect the ultimate result.
They are mentioned in this blog https://vivien000.github.io/blog/journal/llm-decoding-with-regex-constraints.html , and they look very helpful.