[Feature] Support `response_format` for `TurboMind` - Githubissues

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

4.71k stars 430 forks source link

[Feature] Support `response_format` for `TurboMind` #2753

Open h4n0 opened 1 week ago

h4n0 commented 1 week ago

Motivation

I'm using TurboMind engine and I got an error while requesting response_format with json_schema. Code here: https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/openai/api_server.py#L367

Is there any plan to support this for TurboMind?

Related resources

if request.response_format and request.response_format.type != 'text':
    if VariableInterface.async_engine.backend != 'pytorch':
        return create_error_response(
            HTTPStatus.BAD_REQUEST,
            'only pytorch backend can use response_format now')
    response_format = request.response_format.model_dump()

Additional context

No response

lvhan028 commented 3 days ago

Yes. We'll support it in December. Stay tuned.