Closed eduardzl closed 1 week ago
@maaquib Can you take a look?
Was able to reproduce this. We're setting the tokenizer on InputFormatConfigs in huggingface and tnx handlers but not on the TRT-LLM handler which is causing this error
Fix #2100 merged. Will backport this to 0.28.0
Description
Inference no longer working for TensorRT-LLM, tried docker tags "tensorrt-llm-nightly" and "0.28.0-tensorrt-llm".
Expected Behavior
The serving engine should return a generated response from TensorRT-LLM.
Error Message
{"error":"Cannot provide chat completion for tokenizer: \u003cclass \u0027NoneType\u0027\u003e, please ensure that your tokenizer supports chat templates.","code":424}
How to Reproduce?
engine=MPI option.dtype=fp16 option.rolling_batch=trtllm option.trust_remote_code=true option.max_input_len=8192 option.max_output_len=8192 option.tensor_parallel_degree=1
start DJL docker container. the container starts, performs convertion of the HF model to TRT model. it reaches state of readiness : "BOTH API bind to: http://0.0.0.0:8080"
When trying to execute a request : curl -H "Content-Type:application/json" -d "@request-djl.txt" http://localhost:8080/v1/chat/completions with body
{ "messages" : [ {"role": "user", "content": "Explain the process of osmosis."} ], "n": 1, "do_sample": true, "temperature": 0.3, "top_p" : 0.9, "top_k" : 50, "repetition_penalty" : 1.1, "max_tokens": 600 }
error is accepted : {"error":"Cannot provide chat completion for tokenizer: \u003cclass \u0027NoneType\u0027\u003e, please ensure that your tokenizer supports chat templates.","code":424}Tokenizer can be loaded from the directory using Transformers code. Works without issues.