Open tedqu opened 2 days ago
Hellow. I saw same error when I coverted llama3.0 using trt-llm-0.14. So I change trt-llm-0.13, and then I can covert llama3.0 to trt model.
Thanks !! cool, man I also used a similar method to solve this current problem. I used the run.py file from the previous version of the code, but I'm not sure of the version number, but the problem was also resolved, it should be a small bug in the latest version.
yspch2022 @.***> 于2024年11月19日周二 08:54写道:
Hellow. I saw same error when I coverted llama3.0 using trt-llm-0.14. So I change trt-llm-0.13, and then I can covert llama3.0 to trt model.
— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/TensorRT-LLM/issues/2452#issuecomment-2484483093, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARD4MCRWGRZABRYINEL765T2BKD5LAVCNFSM6AAAAABR6T5UAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBUGQ4DGMBZGM . You are receiving this because you authored the thread.Message ID: @.***>
Environment
python3 ../run.py \ --input_text "你好,请问你叫什么?" \ --max_output_len=50 \ --tokenizer_dir /data/models/Qwen1.5-7B-Chat/ \ --engine_dir ./tmp/qwen/7B/trt_engines/fp16/1-gpu/
•Example Code: examples/qwen/run.py (from README)
Description
While running the run.py script as described in the README of the examples/qwen/ directory, the following error occurs when invoking runner.generate:
Error Traceback
Traceback (most recent call last): File "/triton/TensorRT-LLM-release-0.14/examples/qwen/../run.py", line 887, in
main(args)
File "/triton/TensorRT-LLM-release-0.14/examples/qwen/../run.py", line 711, in main
outputs = runner.generate(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 624, in generate
requests = [
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 625, in
trtllm.Request(
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
Invoked with: kwargs: input_token_ids=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 108386, 37945, 56007, 56568, 99882, 99245, 11319, 151645, 198, 151644, 77091, 198], encoder_input_token_ids=None, encoder_output_length=None, encoder_input_features=None, position_ids=None, max_tokens=50, num_return_sequences=None, pad_id=151643, end_id=151645, stop_words=None, bad_words=None, sampling_config=<tensorrt_llm.bindings.executor.SamplingConfig object at 0x7f000502f830>, lookahead_config=None, streaming=False, output_config=<tensorrt_llm.bindings.executor.OutputConfig object at 0x7f0001cca270>, prompt_tuning_config=None, lora_config=None, return_all_generated_tokens=False, logits_post_processor_name=None, external_draft_tokens_config=None
Additional Context
The engine and tokenizer paths are configured as follows: • --tokenizer_dir: /data/models/Qwen1.5-7B-Chat/ • --engine_dir: ./tmp/qwen/7B/trt_engines/fp16/1-gpu/
The engine appears to load successfully, as indicated by the log output:
[TensorRT-LLM][INFO] Engine version 0.14.0 found in the config file, assuming engine(s) built by new builder API. ... [11/18/2024-02:33:18] [TRT-LLM] [I] Load engine takes: 12.188158512115479 sec
However, the error seems to indicate a problem with the argument types for the tensorrt_llm.bindings.executor.Request class, particularly with sampling_config and output_config.
If more logs or information are needed, please let me know! Thank you!