NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.67k stars 990 forks source link

No module named 'utils.utils' When try to run QwenVL #960

Open Agent-Chu opened 9 months ago

Agent-Chu commented 9 months ago

System Info

CPU architecture : x86_64 GPU name : NVIDIA V10 32G

Who can help?

No response

Information

Tasks

Reproduction

branch : main b57221b764bc579cbb2490154916a871f620e2c4

1.build trt_llm on the main branch 2.pip install ./build/tensorrt_llm-0.8.0.dev20240123-cp310-cp310-linux_x86_64.whl 3.start run the QwenVL example 3.1 Download Qwen-VL

git lfs install
git clone https://huggingface.co/Qwen/Qwen-VL-Chat

3.2 ViT

python3 vit_onnx_trt.py --pretrained_model_path ./Qwen-VL-Chat

3.2 Qwen Quantize the weights to INT4 with GPTQ

pip install auto-gptq
python3 gptq_convert.py --hf_model_dir ./Qwen-VL-Chat --tokenizer_dir ./Qwen-VL-Chat \
        --quant_ckpt_path ./Qwen-VL-Chat-4bit

Expected behavior

I expected that the gptq_convert.py can convert successfully, but it not.

actual behavior

The result can be as follow:

root@bbc1235:~/TensorRT-LLM/examples/qwenvl# python3 gptq_convert.py --hf_model_dir ./Qwen-VL-Chat --tokenizer_dir ./Qwen-VL-Chat  --quant_ckpt_path ./Qwen-VL-Chat-4bit
CUDA extension not installed.
CUDA extension not installed.
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev20240123, commit: b57221b764bc579cbb2490154916a871f620e2c4
Traceback (most recent call last):
  File "/root/TensorRT-LLM/examples/qwenvl/gptq_convert.py", line 13, in <module>
    from utils.utils import make_context
ModuleNotFoundError: No module named 'utils.utils'

additional notes

It seems that there is no module called utils.utils. then I pip install utils ,but it not work. Because there is no function called make_context in the module utils.

The result can be as follow:

root@bbc1235:~/TensorRT-LLM/examples/qwenvl# python3 gptq_convert.py --hf_model_dir ./Qwen-VL-Chat --tokenizer_dir ./Qwen-VL-Chat         --quant_ckpt_path ./Qwen-VL-Chat-4bit
CUDA extension not installed.
CUDA extension not installed.
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev20240123, commit: b57221b764bc579cbb2490154916a871f620e2c4
Traceback (most recent call last):
  File "/root/TensorRT-LLM/examples/qwenvl/gptq_convert.py", line 13, in <module>
    from utils import make_context
ImportError: cannot import name 'make_context' from 'utils' (/usr/local/lib/python3.10/dist-packages/utils/__init__.py)

I think whether you have a module called utils with the same name of pip package? and you forget to upload ?

nv-guomingz commented 1 day ago

Hi @Agent-Chu would u please try our latest code base to see if the issue still exists?

And do u still have further issue or question now? If not, we'll close it soon.