TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
CPU architecture : x86_64
GPU name : NVIDIA V10 32G
Who can help?
No response
Information
[X] The official example scripts
[ ] My own modified scripts
Tasks
[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction
branch : main
b57221b764bc579cbb2490154916a871f620e2c4
1.build trt_llm on the main branch
2.pip install ./build/tensorrt_llm-0.8.0.dev20240123-cp310-cp310-linux_x86_64.whl
3.start run the QwenVL example
3.1 Download Qwen-VL
I expected that the gptq_convert.py can convert successfully, but it not.
actual behavior
The result can be as follow:
root@bbc1235:~/TensorRT-LLM/examples/qwenvl# python3 gptq_convert.py --hf_model_dir ./Qwen-VL-Chat --tokenizer_dir ./Qwen-VL-Chat --quant_ckpt_path ./Qwen-VL-Chat-4bit
CUDA extension not installed.
CUDA extension not installed.
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev20240123, commit: b57221b764bc579cbb2490154916a871f620e2c4
Traceback (most recent call last):
File "/root/TensorRT-LLM/examples/qwenvl/gptq_convert.py", line 13, in <module>
from utils.utils import make_context
ModuleNotFoundError: No module named 'utils.utils'
additional notes
It seems that there is no module called utils.utils. then I pip install utils ,but it not work. Because there is no function called make_context in the module utils.
The result can be as follow:
root@bbc1235:~/TensorRT-LLM/examples/qwenvl# python3 gptq_convert.py --hf_model_dir ./Qwen-VL-Chat --tokenizer_dir ./Qwen-VL-Chat --quant_ckpt_path ./Qwen-VL-Chat-4bit
CUDA extension not installed.
CUDA extension not installed.
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev20240123, commit: b57221b764bc579cbb2490154916a871f620e2c4
Traceback (most recent call last):
File "/root/TensorRT-LLM/examples/qwenvl/gptq_convert.py", line 13, in <module>
from utils import make_context
ImportError: cannot import name 'make_context' from 'utils' (/usr/local/lib/python3.10/dist-packages/utils/__init__.py)
I think whether you have a module called utils with the same name of pip package? and you forget to upload ?
System Info
CPU architecture : x86_64 GPU name : NVIDIA V10 32G
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
branch : main b57221b764bc579cbb2490154916a871f620e2c4
1.build trt_llm on the main branch 2.pip install ./build/tensorrt_llm-0.8.0.dev20240123-cp310-cp310-linux_x86_64.whl 3.start run the QwenVL example 3.1 Download Qwen-VL
3.2 ViT
3.2 Qwen Quantize the weights to INT4 with GPTQ
Expected behavior
I expected that the gptq_convert.py can convert successfully, but it not.
actual behavior
The result can be as follow:
additional notes
It seems that there is no module called utils.utils. then I pip install utils ,but it not work. Because there is no function called make_context in the module utils.
The result can be as follow:
I think whether you have a module called utils with the same name of pip package? and you forget to upload ?