Open linboyang opened 1 year ago
我搜了一下,topp_sampling是一个解码策略,调用的是cuda kernel,,请参考:
Top-p, also known as nucleus sampling, allows for a more dynamic selection of values to be sampled from. In top-p sampling, the model sums the probabilities of the most likely next values in descending order and stops when the sum reaches p. Only the values within this cumulative probability are considered. Common values for top-p (nucleus) sampling in language models typically range from 0.9 to 0.95. A top-p value of 0.9, for example, means that the model will consider the smallest set of values whose cumulative probability exceeds 90%.
TopPProcess没有调用cuda kernel,是一个python的实现。
请提出你的问题
代码如下:目录:PaddleNLP/model_zoo/gpt-3/ppfleetx/models/language_model/gpt/auto/auto_model.py:1003