InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.3k stars 386 forks source link

Support guided decoding for pytorch backend #1856

Closed AllentDan closed 3 weeks ago

AllentDan commented 3 months ago

1614

1664

grimoire commented 2 months ago

Since the sampling is not batched, I am worry about the performance.

lvhan028 commented 1 month ago

May resolve the conflict

lvhan028 commented 1 month ago

Can we use openai package to access this feature

https://platform.openai.com/docs/guides/structured-outputs/introduction

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed
AllentDan commented 1 month ago

Can we use openai package to access this feature

https://platform.openai.com/docs/guides/structured-outputs/introduction

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

The script can not be executed on the client side. We provided an example of an OpenAI package in this PR documents.

grimoire commented 1 month ago
from lmdeploy import pipeline
from lmdeploy.messages import GenerationConfig, PytorchEngineConfig

model = 'internlm2-chat-7b'
guide = {
    'type': 'object',
    'properties': {
        'name': {
            'type': 'string'
        },
        'skills': {
            'type': 'array',
            'items': {
                'type': 'string',
                'maxLength': 10
            },
            'minItems': 3
        },
        'work history': {
            'type': 'array',
            'items': {
                'type': 'object',
                'properties': {
                    'company': {
                        'type': 'string'
                    },
                    'duration': {
                        'type': 'string'
                    }
                },
                'required': ['company']
            }
        }
    },
    'required': ['name', 'skills', 'work history']
}
pipe = pipeline(model, backend_config=PytorchEngineConfig(), log_level='INFO')
gen_config = GenerationConfig(
    response_format=dict(type='json_object', guide=guide))
response = pipe(['Make a self introduction please.'], gen_config=gen_config)
print(response)

Error:

Compiling FSM index for all state transitions:  74%|██████████████████████▊        | 222/301 [00:06<00:02, 31.89it/s]
2024-08-15 11:23:57,196 - lmdeploy - ERROR - The vocabulary does not allow us to build a sequence that matches the input regex
AllentDan commented 1 month ago

好像是模型 tokenizer 处理不了,换 internlm2-chat-1_8b 或者其他 llama 模型就正常了。或者输入work history内容不要。

lvhan028 commented 1 month ago

May resolve the conflicts

lvhan028 commented 3 weeks ago

may put outlines in runtime.txt