NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.67k stars 990 forks source link

make -C docker run LOCAL_USER=1 FAILED #601

Open LoverLost opened 11 months ago

LoverLost commented 11 months ago

I'm using docker in rootless mode.

When i run make -C docker run LOCAL_USER=1, it failed with the following details:

[+] Building 4.2s (4/4) FINISHED                                                                                                                       docker:default
 => [internal] load build definition from Dockerfile.user                                                                                                        0.0s
 => => transferring dockerfile: 429B                                                                                                                             0.0s
 => [internal] load .dockerignore                                                                                                                                0.0s
 => => transferring context: 233B                                                                                                                                0.0s
 => ERROR [internal] load metadata for docker.io/tensorrt_llm/devel:latest                                                                                       4.1s
 => [auth] tensorrt_llm/devel:pull token for registry-1.docker.io                                                                                                0.0s
------
 > [internal] load metadata for docker.io/tensorrt_llm/devel:latest:
------
Dockerfile.user:3
--------------------
   1 |     ARG BASE_IMAGE_WITH_TAG
   2 |     
   3 | >>> FROM ${BASE_IMAGE_WITH_TAG} as base
   4 |     
   5 |     # Alternative user
--------------------
ERROR: failed to solve: tensorrt_llm/devel:latest: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
make: *** [Makefile:86: devel_run] Error 1
make: Leaving directory '/ssd/home/mhma/accelerate/TensorRT-LLM/docker'

I have logged into docker on linux by using docker login, but the error still occurs. Does anyone know the reason?

byshiue commented 11 months ago

Could you run

make -C docker build

successfully?

LoverLost commented 11 months ago

Could you run

make -C docker build

successfully?

yep, i run make -C docker build successfully. Moreover, here are the outputs of this command:

make: Entering directory '/ssd/home/mhma/accelerate/TensorRT-LLM/docker'
Building docker image: tensorrt_llm/devel:latest
DOCKER_BUILDKIT=1 docker build --pull  \
        --progress auto \
         --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
         --build-arg BASE_TAG=23.10-py3 \
         --build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt" \
         --build-arg TORCH_INSTALL_TYPE="skip" \
         \
         \
         \
         \
         \
         --target devel \
        --file Dockerfile.multi \
        --tag tensorrt_llm/devel:latest \
        ..
[+] Building 7.6s (18/18) FINISHED                                                                                                                     docker:default
 => [internal] load .dockerignore                                                                                                                                0.0s
 => => transferring context: 233B                                                                                                                                0.0s
 => [internal] load build definition from Dockerfile.multi                                                                                                       0.0s
 => => transferring dockerfile: 2.09kB                                                                                                                           0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:23.10-py3                                                                                                7.2s
 => [internal] load build context                                                                                                                                0.3s
 => => transferring context: 371B                                                                                                                                0.2s
 => [base 1/1] FROM nvcr.io/nvidia/pytorch:23.10-py3@sha256:72d016011185c8e8c82442c87135def044f0f9707f9fd4ec1703a9e403ad4c35                                     0.0s
 => CACHED [devel  1/12] COPY docker/common/install_base.sh install_base.sh                                                                                      0.0s
 => CACHED [devel  2/12] RUN bash ./install_base.sh && rm install_base.sh                                                                                        0.0s
 => CACHED [devel  3/12] COPY docker/common/install_cmake.sh install_cmake.sh                                                                                    0.0s
 => CACHED [devel  4/12] RUN bash ./install_cmake.sh && rm install_cmake.sh                                                                                      0.0s
 => CACHED [devel  5/12] COPY docker/common/install_tensorrt.sh install_tensorrt.sh                                                                              0.0s
 => CACHED [devel  6/12] RUN bash ./install_tensorrt.sh     --TRT_VER=${TRT_VER}     --CUDA_VER=${CUDA_VER}     --CUDNN_VER=${CUDNN_VER}     --NCCL_VER=${NCCL_  0.0s
 => CACHED [devel  7/12] COPY docker/common/install_polygraphy.sh install_polygraphy.sh                                                                          0.0s
 => CACHED [devel  8/12] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh                                                                            0.0s
 => CACHED [devel  9/12] COPY docker/common/install_mpi4py.sh install_mpi4py.sh                                                                                  0.0s
 => CACHED [devel 10/12] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh                                                                                    0.0s
 => CACHED [devel 11/12] COPY docker/common/install_pytorch.sh install_pytorch.sh                                                                                0.0s
 => CACHED [devel 12/12] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh                                                                             0.0s
 => exporting to image                                                                                                                                           0.0s
 => => exporting layers                                                                                                                                          0.0s
 => => writing image sha256:309b86b99ee938b7158ac25e46a45f1f9f33311d22e35b1e724062414fde09e8                                                                     0.0s
 => => naming to docker.io/tensorrt_llm/devel:latest                                                                                                             0.0s
make: Leaving directory '/ssd/home/mhma/accelerate/TensorRT-LLM/docker'
byshiue commented 11 months ago

Could you launch other docker container?

taozhang9527 commented 7 months ago

I have exact the same issue. Any update on this?

taozhang9527 commented 7 months ago

It seems you have to run make -C docker release_run LOCAL_USER=1 instead of make -C docker run LOCAL_USER=1. Otherwise, it will look for the devel version of the image and cause error.

jayakommuru commented 3 months ago

@taozhang9527 @byshiue

I am running make -C docker release_run LOCAL_USER=1 but still facing this error:

pull access denied for tensorrt_llm/release, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Full logs:


docker build --progress auto --build-arg BASE_IMAGE_WITH_TAG=tensorrt_llm/release:latest --build-arg USER_ID=0 --build-arg USER_NAME=root --build-arg GROUP_ID=0 --build-arg GROUP_NAME=root --file Dockerfile.user --tag tensorrt_llm/release:latest-root ..
Sending build context to Docker daemon  1.567GB
Step 1/9 : ARG BASE_IMAGE_WITH_TAG
Step 2/9 : FROM ${BASE_IMAGE_WITH_TAG} as base
pull access denied for tensorrt_llm/release, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
make: *** [Makefile:118: release_run] Error 1
make: Leaving directory '/home/jaya_kommuru/TensorRT-LLM/docker'```

`make -C docker build` is running fine though
jayakommuru commented 3 months ago

@LoverLost were you able to figure this out?

binghanc commented 3 months ago

i got same error

littlefive5 commented 3 months ago

same error

purejomo commented 2 months ago

same error

[+] Building 1.5s (3/3) FINISHED
=> [internal] load build definition from Dockerfile.user
=> => transferring dockerfile: 638B
=> ERROR [internal] load metadata for docker.io/tensorrt_llm/release:latest
=> [auth] tensorrt_llm/release:pull token for registry-1.docker.io

[internal] load metadata for docker.io/tensorrt_llm/release:latest: Dockerfile.user:3 1 | ARG BASE_IMAGE_WITH_TAG 2 |
3 | >>> FROM ${BASE_IMAGE_WITH_TAG} as base 4 |
5 | # Alternative user ERROR: failed to solve: tensorrt_llm/release:latest: failed to resolve source metadata for docker.io/tensorrt_llm/release:latest: pull access denied, reorization failed

EmilioZhao commented 2 months ago

Same error: pull access denied, Anybody here to help?? @kaiyux @Shixiaowei02

EmilioZhao commented 2 months ago

Could you run

make -C docker build successfully?

Hi @byshiue ! Is your suggestion just launch the docker built by make -C docker build instead of using make -C docker release_run LOCAL_USER=1 as a workaround ?

I built the docker successfully and run docker by the following command (which I studied from docker/Makefile)

sudo docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864  \
                    --gpus=all \
                    --volume /home/lixiang/projects/TensorRT-LLM:/workspace/tensorrt_llm \
                    --env "CCACHE_DIR=/workspace/tensorrt_llm/cpp/.ccache" \
                    --env "CCACHE_BASEDIR=/workspace/tensorrt_llm" \
                    --workdir /workspace/tensorrt_llm \
                    --hostname trtllm-devel \
                    --name tensorrt_llm-devel-root \
                    --tmpfs /tmp:exec \
                    docker.io/tensorrt_llm/devel:latest 

However it reported the following message after running docker which may incurred incompatibility of cuda :

WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 535.183.06 which has support for CUDA 12.2.  This container
  was built with CUDA 12.5 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Does this affect the normal use of tensorrt-llm?

EmilioZhao commented 2 months ago

After executing commands:

cd exammples/llama
pip install -r requirements.txt

pip complained:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
cudf 24.4.0 requires protobuf<5,>=3.20, but you have protobuf 5.28.0 which is incompatible.
cudf 24.4.0 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.
dask-cuda 24.4.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.3 which is incompatible.
dask-cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
torchvision 0.19.0a0 requires torch==2.4.0a0+3bcc3cddb5.nv24.07, but you have torch 2.4.0 which is incompatible.

After upgrade transformer by pip install --upgrade transformers # Llama 3.1 requires transformer 4.43.0+ version., pip complained:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
optimum 1.21.4 requires transformers[sentencepiece]<4.44.0,>=4.29.0, but you have transformers 4.44.2 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires nvidia-modelopt~=0.15.0, but you have nvidia-modelopt 0.13.0 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires pynvml>=11.5.0, but you have pynvml 11.4.1 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires transformers<=4.42.4,>=4.38.2, but you have transformers 4.44.2 which is incompatible.

How to solve the dependency problems?

I executed tensorrt_llm import command in python, it throw exceptions:

root@trtllm-devel:/workspace/tensorrt_llm/examples/llama# python -c 'import tensorrt_llm'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_flash_attention_utils import _flash_attention_forward
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in <module>
    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 33, in <module>
    import tensorrt_llm.models as models
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/__init__.py", line 31, in <module>
    from .gemma.model import GemmaForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/model.py", line 18, in <module>
    from tensorrt_llm.models.gemma.convert import (QuantizeModifiers, Weights,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/convert.py", line 39, in <module>
    from tensorrt_llm.models.gemma.smoothquant import (capture_activation_range,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/smoothquant.py", line 27, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1594, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I think it was caused by CUDA incompatibility between CUDA driver 535.183.06 and CUDA tookit 12.5 installed nativelly in your docker. @byshiue @kaiyux

EmilioZhao commented 2 months ago

After executing commands:

cd exammples/llama
pip install -r requirements.txt

pip complained:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
cudf 24.4.0 requires protobuf<5,>=3.20, but you have protobuf 5.28.0 which is incompatible.
cudf 24.4.0 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.
dask-cuda 24.4.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.3 which is incompatible.
dask-cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
torchvision 0.19.0a0 requires torch==2.4.0a0+3bcc3cddb5.nv24.07, but you have torch 2.4.0 which is incompatible.

After upgrade transformer by pip install --upgrade transformers # Llama 3.1 requires transformer 4.43.0+ version., pip complained:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
optimum 1.21.4 requires transformers[sentencepiece]<4.44.0,>=4.29.0, but you have transformers 4.44.2 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires nvidia-modelopt~=0.15.0, but you have nvidia-modelopt 0.13.0 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires pynvml>=11.5.0, but you have pynvml 11.4.1 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires transformers<=4.42.4,>=4.38.2, but you have transformers 4.44.2 which is incompatible.

How to solve the dependency problems?

I executed tensorrt_llm import command in python, it throw exceptions:

root@trtllm-devel:/workspace/tensorrt_llm/examples/llama# python -c 'import tensorrt_llm'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_flash_attention_utils import _flash_attention_forward
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in <module>
    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 33, in <module>
    import tensorrt_llm.models as models
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/__init__.py", line 31, in <module>
    from .gemma.model import GemmaForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/model.py", line 18, in <module>
    from tensorrt_llm.models.gemma.convert import (QuantizeModifiers, Weights,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/convert.py", line 39, in <module>
    from tensorrt_llm.models.gemma.smoothquant import (capture_activation_range,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/smoothquant.py", line 27, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1594, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I think it was caused by CUDA incompatibility between CUDA driver 535.183.06 and CUDA tookit 12.5 installed nativelly in your docker. @byshiue @kaiyux

pip installed a newer flash_attn solved the undefined flash attention symbol problem. flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

A new error comes:

oot@trtllm-devel:/workspace# python -c 'import tensorrt_llm'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 28, in <module>
    from ...image_processing_utils import ImageProcessingMixin
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_utils.py", line 21, in <module>
    from .image_transforms import center_crop, normalize, rescale
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_transforms.py", line 22, in <module>
    from .image_utils import (
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_utils.py", line 58, in <module>
    from torchvision.transforms import InterpolationMode
  File "/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/usr/local/lib/python3.10/dist-packages/torch/library.py", line 654, in register
    use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
  File "/usr/local/lib/python3.10/dist-packages/torch/library.py", line 154, in _register_fake
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/usr/local/lib/python3.10/dist-packages/torch/_library/abstract_impl.py", line 31, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

After upgrade torchvision by pip install --upgrade torchvision , I finally made it. :)

EmilioZhao commented 2 months ago

Mabye because of incompatibility of Numpy , I encountered an error in building stage:

trtllm-build --checkpoint_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/ \                     root@trtllm-devel 07:46:54
                --gemm_plugin float16 \
                --output_dir ./bloom/560M/trt_engines/fp16/1-gpu/
[TensorRT-LLM] TensorRT-LLM version: 0.13.0.dev2024090300
[09/10/2024-07:47:07] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set gemm_plugin to float16.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set nccl_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set lookup_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set lora_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set moe_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set context_fmha to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set remove_input_padding to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set reduce_fusion to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set enable_xqa to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set tokens_per_block to 64.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set multiple_profiles to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set paged_state to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set streamingllm to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set use_fused_mlp to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Compute capability: (8, 6)
[09/10/2024-07:47:07] [TRT-LLM] [I] SM count: 68
[09/10/2024-07:47:07] [TRT-LLM] [I] SM clock: 2100 MHz
[09/10/2024-07:47:07] [TRT-LLM] [I] int4 TFLOPS: 584
[09/10/2024-07:47:07] [TRT-LLM] [I] int8 TFLOPS: 292
[09/10/2024-07:47:07] [TRT-LLM] [I] fp8 TFLOPS: 0
[09/10/2024-07:47:07] [TRT-LLM] [I] float16 TFLOPS: 146
[09/10/2024-07:47:07] [TRT-LLM] [I] bfloat16 TFLOPS: 146
[09/10/2024-07:47:07] [TRT-LLM] [I] float32 TFLOPS: 73
[09/10/2024-07:47:07] [TRT-LLM] [I] Total Memory: 10 GiB
[09/10/2024-07:47:07] [TRT-LLM] [I] Memory clock: 9501 MHz
[09/10/2024-07:47:07] [TRT-LLM] [I] Memory bus width: 320
[09/10/2024-07:47:07] [TRT-LLM] [I] Memory bandwidth: 760 GB/s
[09/10/2024-07:47:07] [TRT-LLM] [I] PCIe speed: 2500 Mbps
[09/10/2024-07:47:07] [TRT-LLM] [I] PCIe link width: 16
[09/10/2024-07:47:07] [TRT-LLM] [I] PCIe bandwidth: 5 GB/s
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 510, in load
    param.value = weights[name]
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 202, in value
    dtype = np_dtype_to_trt(v.dtype)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 218, in np_dtype_to_trt
    assert ret is not None, f'Unsupported dtype: {dtype}'
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/_dtype.py", line 42, in __str__
    return dtype.name
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/_dtype.py", line 362, in _name_get
    if _name_includes_bit_suffix(dtype):
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/_dtype.py", line 339, in _name_includes_bit_suffix
    elif np.issubdtype(dtype, np.flexible) and _isunsized(dtype):
  File "/usr/local/lib/python3.10/dist-packages/numpy/core/numerictypes.py", line 417, in issubdtype
    arg1 = dtype(arg1).type
TypeError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 520, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 384, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 351, in build_and_save
    engine = build_model(build_config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 320, in build_model
    model = model_cls.from_checkpoint(ckpt_dir, config=rank_config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 485, in from_checkpoint
    model.load(weights, from_pruned=is_checkpoint_pruned)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 512, in load
    raise RuntimeError(
RuntimeError: Encounter error '' for parameter 'transformer.vocab_embedding.weight'