NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.34k stars 936 forks source link

AttributeError: 'NoneType' object has no attribute 'trt_tensor' #643

Closed wjueyao closed 9 months ago

wjueyao commented 9 months ago

I used the following steps to build SQ engine

First, build docker image from main branch

git clone -b main  https://github.com/triton-inference-server/tensorrtllm_backend.git
# Update the submodules
cd tensorrtllm_backend
git lfs install
git submodule update --init --recursive
# Use the Dockerfile to build the backend in a container
# For x86_64
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .

Then build engines using the above image

python3 hf_llama_convert.py -i /weights/huggingface/codellama/CodeLlama-7b-hf   \
                -o /weights/trt_llm/2023.12.12/llama_7B_sq0.5 \
                -sq 0.5 \
                --tensor-parallelism 4 \
                --storage-type fp16

# Build model for SmoothQuant in the _per_token_ + _per_channel_ mode
python3 build.py --ft_model_dir=/weights/trt_llm/2023.12.12/llama_7B_sq0.5/4-gpu/ \
                --dtype float16 \
                --remove_input_padding \
                --use_gpt_attention_plugin float16 \
                --paged_kv_cache \
                --use_inflight_batching \
                --enable_context_fmha \
                --use_gemm_plugin float16 \
                --max_batch_size 64  \
                --vocab_size 32016  \
                --rotary_base 1000000  \
                --use_smooth_quant \
                --per_token \
                --per_channel \
                --world_size 4 \
                --tp_size 4 \
                --output_dir /weights/trt_llm/2023.12.12/sq-convert/4-gpu

I got the following error when I run build.py

Traceback (most recent call last):
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/build.py", line 839, in <module>
    build(0, args)
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/build.py", line 783, in build
    engine = build_rank_engine(builder, builder_config, engine_name,
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/build.py", line 710, in build_rank_engine
    tensorrt_llm_llama(*inputs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
    return self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 379, in forward
    hidden_states = super().forward(input_ids, position_ids, use_cache,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 260, in forward
    hidden_states = layer(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
    return self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 117, in forward
    attention_output = self.attention(hidden_states,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
    return self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/layers.py", line 1103, in forward
    context, past_key_value = gpt_attention(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 564, in wrapper
    outs = f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 3548, in gpt_attention
    plug_inputs = [i.trt_tensor for i in plug_inputs]
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 3548, in <listcomp>
    plug_inputs = [i.trt_tensor for i in plug_inputs]
AttributeError: 'NoneType' object has no attribute 'trt_tensor'
wjueyao commented 9 months ago

@Tracin please take a look here

Tracin commented 9 months ago

@wjueyao This will be fixed in release v0.7.0 in recent days.