[Feature] How to support do_sample config just like Automodel 能否像Automodel推理中的do_sample参数支持，支持使用确定性生成方法，而不是随机采样

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

3.11k stars 280 forks source link

[Feature] How to support do_sample config just like Automodel 能否像Automodel推理中的do_sample参数支持，支持使用确定性生成方法，而不是随机采样 #1833

Open Leo-yang-1020 opened 6 days ago

Leo-yang-1020 commented 6 days ago

Motivation

When using lmdeploy to inference, sometimes we'd like to set do_sample = false, but according to official document there's no do_sampling config, can we add this just like Automodel? e.g: generation_config = dict( num_beams=1, max_new_tokens=512, do_sample=False, )

Related resources

No response

Additional context

No response

lvhan028 commented 5 days ago

We'll support do_sample in July. Currently, the team is rushing for the version in June.

lvhan028 commented 5 days ago

top_k=1 is equivalent to do_sample=False

Leo-yang-1020 commented 5 days ago

We'll support do_sample in July. Currently, the team is rushing for the version in June.

Thanks a lot!

zhyncs commented 4 days ago

Should our API design align with Transformers or OpenAI API?

ref https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_p https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature

The OpenAI API does not have the top_k and do_sample parameters.

lvhan028 commented 4 days ago

For the pipeline API, we would like to align with transformers