InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.15k stars 376 forks source link

[RFC] Refactor chat template and remove model name from engine config #1065

Open AllentDan opened 7 months ago

AllentDan commented 7 months ago

Motivation

Major features

How to use

For api_server, to use an extra template, commands could be:

lmdeploy serve api_server $MODEL_PATH --chat-template $JINJIA

For APIs like pipeline, we are going to provide documents to show how to add a chat template in python or Jinjia. The codes will be:

chat_template = PythonTemplate() # or a function or a Jinjia str or file path
input_inds = tokenizer.apply_chat_template(messages, chat_template=chat_template)
pipeline(input_ids)
lvhan028 commented 7 months ago
  1. It's better to deprecate model_name instead of removing it directly. PR #1022 is working on it for TurbomindEngineConfig
  2. For the pipeline case, are you suggesting changing prompts to input_ids?
AllentDan commented 7 months ago
  1. It's better to deprecate model_name instead of removing it directly. PR Remove model name when loading hf model #1022 is working on it for TurbomindEngineConfig
  2. For the pipeline case, are you suggesting changing prompts to input_ids?
  1. Removing model_name refers to PytorchEngineConfig then. model_name still exists in ChatTemplateConfig whic is not gonna be deprecated.
  2. Not implemented yet. Maybe I will provide a function for user to set chat_template to tokenizer in pipeline.
irexyc commented 7 months ago

Should we consider making the chat template or model_name a required argument? The matching strategy is quite prone to failure or mismatching to the wrong template.

AllentDan commented 7 months ago

I prefer an accurate matching like FastChat.

AllentDan commented 7 months ago

The /v1/completions interface does not consider any dialogue template issues and only accepts an input string, which is directly processed by the tokenizer and then inferred. Therefore, only /v1/chat/completions and /v1/chat/interactive are discussed:

  1. No model_name, no jinja-template:
    • /v1/chat/completions prioritizes reading the Jinja template from tokenizer_config.json, and if not available, uses the one defined by LMDeploy.
    • /v1/chat/interactive uses the one defined by LMDeploy.
  2. model_name provided, no jinja-template:
    • Use the dialogue template corresponding to the specified model_name in LMDeploy format.
    • Both interfaces use the dialogue template defined by LMDeploy.
  3. No model_name provided, jinja-template provided:
    • Only ensure that /v1/chat/completions uses the jinja-template.
    • The behavior of /v1/chat/interactive is not guaranteed, and it usually responds to the request as text completion.
  4. model_name provided, jinja-template provided:
    • If model_name finds a dialogue template, all interfaces use the LMDeploy dialogue template.
    • If model_name does not find a dialogue template, an error is reported directly, and the program cannot be started.