deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
182 stars 59 forks source link

DJL-TensorRT-LLM : inference no longer working, tokenizer error. #2070

Closed eduardzl closed 1 week ago

eduardzl commented 3 weeks ago

Description

Inference no longer working for TensorRT-LLM, tried docker tags "tensorrt-llm-nightly" and "0.28.0-tensorrt-llm".

Expected Behavior

The serving engine should return a generated response from TensorRT-LLM.

Error Message

{"error":"Cannot provide chat completion for tokenizer: \u003cclass \u0027NoneType\u0027\u003e, please ensure that your tokenizer supports chat templates.","code":424}

How to Reproduce?

  1. Create model directory, for example my-model.
  2. Download Mistral-v0.2-7b-Instruct model files with tokenizer into my-model directory.
  3. Create serving.properties file : For example engine=MPI option.dtype=fp16 option.rolling_batch=trtllm option.trust_remote_code=true option.max_input_len=8192 option.max_output_len=8192 option.tensor_parallel_degree=1

start DJL docker container. the container starts, performs convertion of the HF model to TRT model. it reaches state of readiness : "BOTH API bind to: http://0.0.0.0:8080"

When trying to execute a request : curl -H "Content-Type:application/json" -d "@request-djl.txt" http://localhost:8080/v1/chat/completions with body { "messages" : [ {"role": "user", "content": "Explain the process of osmosis."} ], "n": 1, "do_sample": true, "temperature": 0.3, "top_p" : 0.9, "top_k" : 50, "repetition_penalty" : 1.1, "max_tokens": 600 } error is accepted : {"error":"Cannot provide chat completion for tokenizer: \u003cclass \u0027NoneType\u0027\u003e, please ensure that your tokenizer supports chat templates.","code":424}

Tokenizer can be loaded from the directory using Transformers code. Works without issues.

frankfliu commented 2 weeks ago

@maaquib Can you take a look?

maaquib commented 1 week ago

Was able to reproduce this. We're setting the tokenizer on InputFormatConfigs in huggingface and tnx handlers but not on the TRT-LLM handler which is causing this error

maaquib commented 1 week ago

Fix #2100 merged. Will backport this to 0.28.0