InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.15k stars 376 forks source link

[Feature] Add `logits_processor` to `GenerationConfig` #2305

Open Dan-wanna-M opened 4 weeks ago

Dan-wanna-M commented 4 weeks ago

Motivation

Many lmdeploy counterparts(vllm, transformers, exllamav2...) provide logits_processors that allow users to modify the logits before softmax. This enables many useful features like constrained decoding, classifier-free guidance, custom repetition penalty, etc.

Related resources

Huggingface API: https://huggingface.co/docs/transformers/en/internal/generation_utils#logitsprocessor Vllm API: https://github.com/vllm-project/vllm/issues/1728

Additional context

I wrote a constrained decoding library and is planning to integrate lmdeploy into my library.

lvhan028 commented 3 weeks ago

@AllentDan May consider this feature in PyTorchEngine. cc @grimoire

lvhan028 commented 3 weeks ago

@Dan-wanna-M I am afraid we cannot support this feature in TurboMindEngine, since it is developed by C++ and CUDA. An external python logits_processors function can't be passed to it.

Dan-wanna-M commented 3 weeks ago

@Dan-wanna-M I am afraid we cannot support this feature in TurboMindEngine, since it is developed by C++ and CUDA. An external python logits_processors function can't be passed to it.

Got it. It would also be great if we can have some kinds of interface that allows us to do native function callback, since libraries implemented in C/C++/rust can leverage that feature.