Closed zhyncs closed 3 days ago
In our use case, if algorithm colleagues use a model, they often use temperature 0. If we directly change it to 1 according to PyTorch's implementation, this is actually inconsistent with our expected usage. Seeing that in TurboMind, a very small value was added to avoid division by zero, I think this is a good solution. And we want the behavior of the PyTorch Engine and TurboMind in LMDeploy to be basically consistent.
Motivation
as titled
Hi @grimoire @lzhangzz @lvhan028 May you help review this? Thanks.
ref TurboMind https://github.com/InternLM/lmdeploy/blob/c59a70413c3600fb22e683c46e085758272e4178/src/turbomind/kernels/sampling_penalty_kernels.cu#L149-L161
https://github.com/InternLM/lmdeploy/blob/c59a70413c3600fb22e683c46e085758272e4178/src/turbomind/kernels/sampling_penalty_kernels.cu#L75-L86
Modification
as titled
Checklist