PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.19k stars 2.95k forks source link

[Question]: 请问modelzoo 目录下的ppfleetx_ops中的topp_sampling 算子和TopProcess函数的区别是什么? #6324

Open linboyang opened 1 year ago

linboyang commented 1 year ago

请提出你的问题

代码如下:目录:PaddleNLP/model_zoo/gpt-3/ppfleetx/models/language_model/gpt/auto/auto_model.py:1003

if top_p is not None and top_p < 1.0:
    if self.use_topp_sampling:
        try:
            from ppfleetx_ops import topp_sampling
        except ImportError:
            raise ImportError(
                "please install ppfleetx_ops by 'cd ppfleetx/ops && python setup_cuda.py install'!"
            )
        top_ps_tensor = paddle.full(shape=[paddle.shape(probs)[0]], fill_value=top_p, dtype=probs.dtype)
        # TODO fake random seed here
        # Users should set the random seed dynamically when inference
        _, next_tokens = topp_sampling(probs, top_ps_tensor, random_seed=100)
    else:
        probs = TopPProcess(probs, top_p, min_tokens_to_keep)

if not self.use_topp_sampling:
    next_tokens = paddle.multinomial(probs)
w5688414 commented 6 months ago

我搜了一下,topp_sampling是一个解码策略,调用的是cuda kernel,,请参考:

https://huyenchip.com/2024/01/16/sampling.html#:~:text=In%20top%2Dp%20sampling%2C%20the,range%20from%200.9%20to%200.95.

Top-p, also known as nucleus sampling, allows for a more dynamic selection of values to be sampled from. In top-p sampling, the model sums the probabilities of the most likely next values in descending order and stops when the sum reaches p. Only the values within this cumulative probability are considered. Common values for top-p (nucleus) sampling in language models typically range from 0.9 to 0.95. A top-p value of 0.9, for example, means that the model will consider the smallest set of values whose cumulative probability exceeds 90%.

https://github.com/PaddlePaddle/PaddleNLP/blob/9f3cf822c669c0d97476e6ed96e2afcd6f8d57b5/model_zoo/gpt-3/ppfleetx/ops/topp_sampling.cu#L485

TopPProcess没有调用cuda kernel,是一个python的实现。

https://github.com/PaddlePaddle/PaddleNLP/blob/9f3cf822c669c0d97476e6ed96e2afcd6f8d57b5/paddlenlp/generation/logits_process.py#L308