InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.11k stars 280 forks source link

misc: align PyTorch Engine temprature with TurboMind #1850

Closed zhyncs closed 3 days ago

zhyncs commented 4 days ago

Motivation

as titled

Hi @grimoire @lzhangzz @lvhan028 May you help review this? Thanks.

ref TurboMind https://github.com/InternLM/lmdeploy/blob/c59a70413c3600fb22e683c46e085758272e4178/src/turbomind/kernels/sampling_penalty_kernels.cu#L149-L161

https://github.com/InternLM/lmdeploy/blob/c59a70413c3600fb22e683c46e085758272e4178/src/turbomind/kernels/sampling_penalty_kernels.cu#L75-L86

Modification

as titled

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.
zhyncs commented 4 days ago

In our use case, if algorithm colleagues use a model, they often use temperature 0. If we directly change it to 1 according to PyTorch's implementation, this is actually inconsistent with our expected usage. Seeing that in TurboMind, a very small value was added to avoid division by zero, I think this is a good solution. And we want the behavior of the PyTorch Engine and TurboMind in LMDeploy to be basically consistent.