NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.58k stars 974 forks source link

Error while building GPT2 ('GPTLMHeadModel' object has no attribute 'position_embedding') #871

Open SehajDxstiny opened 9 months ago

SehajDxstiny commented 9 months ago

i ran all the steps given to run gpt2m here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/gpt, specifically:

rm -rf gpt2 && git clone https://huggingface.co/gpt2-medium gpt2 pushd gpt2 && rm pytorch_model.bin model.safetensors && wget -q https://huggingface.co/gpt2-medium/resolve/main/pytorch_model.bin && popd

python3 hf_gpt_convert.py -i gpt2 -o ./c-model/gpt2 --tensor-parallelism 1 --storage-type float16

after that I ran this command: python3 build.py --model_dir=./c-model/gpt2/1-gpu --use_gpt_attention_plugin --remove_input_padding

I get an error:

[01/12/2024-11:00:55] [TRT-LLM] [I] Setting model configuration from ./c-model/gpt2/1-gpu.
[01/12/2024-11:00:55] [TRT-LLM] [I] use_gpt_attention_plugin set, without specifying a value. Using float16 automatically.
[01/12/2024-11:00:55] [TRT-LLM] [I] Serially build TensorRT engines.
[01/12/2024-11:00:55] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 114, GPU 263 (MiB)
[01/12/2024-11:00:58] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1799, GPU +312, now: CPU 2049, GPU 575 (MiB)
[01/12/2024-11:00:58] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[01/12/2024-11:00:59] [TRT-LLM] [I] Loading weights from FT...
Traceback (most recent call last):
  File "/root/TensorRT-LLM/examples/gpt/build.py", line 810, in <module>
    run_build()
  File "/root/TensorRT-LLM/examples/gpt/build.py", line 802, in run_build
    build(0, args)
  File "/root/TensorRT-LLM/examples/gpt/build.py", line 747, in build
    engine = build_rank_engine(builder, builder_config, engine_name,
  File "/root/TensorRT-LLM/examples/gpt/build.py", line 590, in build_rank_engine
    load_from_ft(tensorrt_llm_gpt,
  File "/root/TensorRT-LLM/examples/gpt/weight.py", line 239, in load_from_ft
    tensorrt_llm_gpt.position_embedding.weight.value = (pe)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 51, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'GPTLMHeadModel' object has no attribute 'position_embedding'
nv-guomingz commented 9 months ago

I tried main branch with your above steps and the engine could be built succesfully.

Could you please check /root/TensorRT-LLM/tensorrt_llm/models/gpt/model.py line 221 ?

I suppose you're using main branch code.

r3sist-uniq commented 9 months ago

I tried this too, and the same error. Where are running this from? Do you run this from inside the repository? I installed tensorRT-LLM using linux installation commands:

# Install dependencies, TensorRT-LLM requires Python 3.10
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
# Install the latest version of TensorRT-LLM
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
# Check installation
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"
SehajDxstiny commented 9 months ago

I tried again, doesn't work. line at 221:

if position_embedding_type == PositionEmbeddingType.learned_absolute:
            self.position_embedding = Embedding(max_position_embeddings,
                                                hidden_size,
                                                dtype=dtype)

edit: any thought? @nv-guomingz

nv-guomingz commented 9 months ago

I tried again, doesn't work. line at 221:

if position_embedding_type == PositionEmbeddingType.learned_absolute:
            self.position_embedding = Embedding(max_position_embeddings,
                                                hidden_size,
                                                dtype=dtype)

edit: any thought? @nv-guomingz

Did u run the gpt example on docker built with doc instructions?

Since I can't reproduce your issue locally, I suggest you add debug code to check tensorrt_llm_gpt has position_embedding or not

SehajDxstiny commented 9 months ago

@nv-guomingz this is how installed TensorRT-LLM (as given in installation guide for linux in the repo):

# Install dependencies, TensorRT-LLM requires Python 3.10
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
# Install the latest version of TensorRT-LLM
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
# Check installation
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"
saramal commented 9 months ago

@nv-guomingz this is how installed TensorRT-LLM (as given in installation guide for linux in the repo):

# Install dependencies, TensorRT-LLM requires Python 3.10
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
# Install the latest version of TensorRT-LLM
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
# Check installation
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

@SehajDxstiny Your installation command will install version 0.7.1, which differs from the main branch. You should try using the examples/gpt files in the 0.7.1 branch.

In my case, the issue was resolved by replacing only four files: weight.py, build.py in ~/examples/gpt, and run.py, utils.py in ~/examples. If the issue persists, I believe replacing all your example code with version 0.7.1 should resolve the problem.