Open LoverLost opened 11 months ago
Could you run
make -C docker build
successfully?
Could you run
make -C docker build
successfully?
yep, i run make -C docker build
successfully. Moreover, here are the outputs of this command:
make: Entering directory '/ssd/home/mhma/accelerate/TensorRT-LLM/docker'
Building docker image: tensorrt_llm/devel:latest
DOCKER_BUILDKIT=1 docker build --pull \
--progress auto \
--build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
--build-arg BASE_TAG=23.10-py3 \
--build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt" \
--build-arg TORCH_INSTALL_TYPE="skip" \
\
\
\
\
\
--target devel \
--file Dockerfile.multi \
--tag tensorrt_llm/devel:latest \
..
[+] Building 7.6s (18/18) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 233B 0.0s
=> [internal] load build definition from Dockerfile.multi 0.0s
=> => transferring dockerfile: 2.09kB 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:23.10-py3 7.2s
=> [internal] load build context 0.3s
=> => transferring context: 371B 0.2s
=> [base 1/1] FROM nvcr.io/nvidia/pytorch:23.10-py3@sha256:72d016011185c8e8c82442c87135def044f0f9707f9fd4ec1703a9e403ad4c35 0.0s
=> CACHED [devel 1/12] COPY docker/common/install_base.sh install_base.sh 0.0s
=> CACHED [devel 2/12] RUN bash ./install_base.sh && rm install_base.sh 0.0s
=> CACHED [devel 3/12] COPY docker/common/install_cmake.sh install_cmake.sh 0.0s
=> CACHED [devel 4/12] RUN bash ./install_cmake.sh && rm install_cmake.sh 0.0s
=> CACHED [devel 5/12] COPY docker/common/install_tensorrt.sh install_tensorrt.sh 0.0s
=> CACHED [devel 6/12] RUN bash ./install_tensorrt.sh --TRT_VER=${TRT_VER} --CUDA_VER=${CUDA_VER} --CUDNN_VER=${CUDNN_VER} --NCCL_VER=${NCCL_ 0.0s
=> CACHED [devel 7/12] COPY docker/common/install_polygraphy.sh install_polygraphy.sh 0.0s
=> CACHED [devel 8/12] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh 0.0s
=> CACHED [devel 9/12] COPY docker/common/install_mpi4py.sh install_mpi4py.sh 0.0s
=> CACHED [devel 10/12] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh 0.0s
=> CACHED [devel 11/12] COPY docker/common/install_pytorch.sh install_pytorch.sh 0.0s
=> CACHED [devel 12/12] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:309b86b99ee938b7158ac25e46a45f1f9f33311d22e35b1e724062414fde09e8 0.0s
=> => naming to docker.io/tensorrt_llm/devel:latest 0.0s
make: Leaving directory '/ssd/home/mhma/accelerate/TensorRT-LLM/docker'
Could you launch other docker container?
I have exact the same issue. Any update on this?
It seems you have to run make -C docker release_run LOCAL_USER=1
instead of make -C docker run LOCAL_USER=1
. Otherwise, it will look for the devel version of the image and cause error.
@taozhang9527 @byshiue
I am running make -C docker release_run LOCAL_USER=1
but still facing this error:
pull access denied for tensorrt_llm/release, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Full logs:
docker build --progress auto --build-arg BASE_IMAGE_WITH_TAG=tensorrt_llm/release:latest --build-arg USER_ID=0 --build-arg USER_NAME=root --build-arg GROUP_ID=0 --build-arg GROUP_NAME=root --file Dockerfile.user --tag tensorrt_llm/release:latest-root ..
Sending build context to Docker daemon 1.567GB
Step 1/9 : ARG BASE_IMAGE_WITH_TAG
Step 2/9 : FROM ${BASE_IMAGE_WITH_TAG} as base
pull access denied for tensorrt_llm/release, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
make: *** [Makefile:118: release_run] Error 1
make: Leaving directory '/home/jaya_kommuru/TensorRT-LLM/docker'```
`make -C docker build` is running fine though
@LoverLost were you able to figure this out?
i got same error
same error
same error
[+] Building 1.5s (3/3) FINISHED
=> [internal] load build definition from Dockerfile.user
=> => transferring dockerfile: 638B
=> ERROR [internal] load metadata for docker.io/tensorrt_llm/release:latest
=> [auth] tensorrt_llm/release:pull token for registry-1.docker.io
[internal] load metadata for docker.io/tensorrt_llm/release:latest: Dockerfile.user:3 1 | ARG BASE_IMAGE_WITH_TAG 2 |
3 | >>> FROM ${BASE_IMAGE_WITH_TAG} as base 4 |
5 | # Alternative user ERROR: failed to solve: tensorrt_llm/release:latest: failed to resolve source metadata for docker.io/tensorrt_llm/release:latest: pull access denied, reorization failed
Same error: pull access denied, Anybody here to help?? @kaiyux @Shixiaowei02
Could you run
make -C docker build successfully?
Hi @byshiue ! Is your suggestion just launch the docker built by make -C docker build
instead of using make -C docker release_run LOCAL_USER=1
as a workaround ?
I built the docker successfully and run docker by the following command (which I studied from docker/Makefile)
sudo docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
--gpus=all \
--volume /home/lixiang/projects/TensorRT-LLM:/workspace/tensorrt_llm \
--env "CCACHE_DIR=/workspace/tensorrt_llm/cpp/.ccache" \
--env "CCACHE_BASEDIR=/workspace/tensorrt_llm" \
--workdir /workspace/tensorrt_llm \
--hostname trtllm-devel \
--name tensorrt_llm-devel-root \
--tmpfs /tmp:exec \
docker.io/tensorrt_llm/devel:latest
However it reported the following message after running docker which may incurred incompatibility of cuda :
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 535.183.06 which has support for CUDA 12.2. This container
was built with CUDA 12.5 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
Does this affect the normal use of tensorrt-llm?
After executing commands:
cd exammples/llama
pip install -r requirements.txt
pip complained:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
cudf 24.4.0 requires protobuf<5,>=3.20, but you have protobuf 5.28.0 which is incompatible.
cudf 24.4.0 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.
dask-cuda 24.4.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.3 which is incompatible.
dask-cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
torchvision 0.19.0a0 requires torch==2.4.0a0+3bcc3cddb5.nv24.07, but you have torch 2.4.0 which is incompatible.
After upgrade transformer by pip install --upgrade transformers # Llama 3.1 requires transformer 4.43.0+ version.
, pip complained:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
optimum 1.21.4 requires transformers[sentencepiece]<4.44.0,>=4.29.0, but you have transformers 4.44.2 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires nvidia-modelopt~=0.15.0, but you have nvidia-modelopt 0.13.0 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires pynvml>=11.5.0, but you have pynvml 11.4.1 which is incompatible.
tensorrt-llm 0.13.0.dev2024090300 requires transformers<=4.42.4,>=4.38.2, but you have transformers 4.44.2 which is incompatible.
How to solve the dependency problems?
I executed tensorrt_llm import command in python, it throw exceptions:
root@trtllm-devel:/workspace/tensorrt_llm/examples/llama# python -c 'import tensorrt_llm'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
from ...modeling_flash_attention_utils import _flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 33, in <module>
import tensorrt_llm.models as models
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/__init__.py", line 31, in <module>
from .gemma.model import GemmaForCausalLM
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/model.py", line 18, in <module>
from tensorrt_llm.models.gemma.convert import (QuantizeModifiers, Weights,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/convert.py", line 39, in <module>
from tensorrt_llm.models.gemma.smoothquant import (capture_activation_range,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/smoothquant.py", line 27, in <module>
from transformers import LlamaConfig, LlamaForCausalLM
File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1594, in __getattr__
value = getattr(module, name)
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in __getattr__
module = self._get_module(self._class_to_module[name])
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I think it was caused by CUDA incompatibility between CUDA driver 535.183.06 and CUDA tookit 12.5 installed nativelly in your docker. @byshiue @kaiyux
After executing commands:
cd exammples/llama pip install -r requirements.txt
pip complained:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible. cudf 24.4.0 requires protobuf<5,>=3.20, but you have protobuf 5.28.0 which is incompatible. cudf 24.4.0 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible. dask-cuda 24.4.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.3 which is incompatible. dask-cudf 24.4.0 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible. torchvision 0.19.0a0 requires torch==2.4.0a0+3bcc3cddb5.nv24.07, but you have torch 2.4.0 which is incompatible.
After upgrade transformer by
pip install --upgrade transformers # Llama 3.1 requires transformer 4.43.0+ version.
, pip complained:ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. optimum 1.21.4 requires transformers[sentencepiece]<4.44.0,>=4.29.0, but you have transformers 4.44.2 which is incompatible. tensorrt-llm 0.13.0.dev2024090300 requires nvidia-modelopt~=0.15.0, but you have nvidia-modelopt 0.13.0 which is incompatible. tensorrt-llm 0.13.0.dev2024090300 requires pynvml>=11.5.0, but you have pynvml 11.4.1 which is incompatible. tensorrt-llm 0.13.0.dev2024090300 requires transformers<=4.42.4,>=4.38.2, but you have transformers 4.44.2 which is incompatible.
How to solve the dependency problems?
I executed tensorrt_llm import command in python, it throw exceptions:
root@trtllm-devel:/workspace/tensorrt_llm/examples/llama# python -c 'import tensorrt_llm' Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module return importlib.import_module("." + module_name, self.__name__) File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module> from ...modeling_flash_attention_utils import _flash_attention_forward File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 27, in <module> from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module> from flash_attn.flash_attn_interface import ( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module> import flash_attn_2_cuda as flash_attn_cuda ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 33, in <module> import tensorrt_llm.models as models File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/__init__.py", line 31, in <module> from .gemma.model import GemmaForCausalLM File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/model.py", line 18, in <module> from tensorrt_llm.models.gemma.convert import (QuantizeModifiers, Weights, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/convert.py", line 39, in <module> from tensorrt_llm.models.gemma.smoothquant import (capture_activation_range, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/gemma/smoothquant.py", line 27, in <module> from transformers import LlamaConfig, LlamaForCausalLM File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1594, in __getattr__ value = getattr(module, name) File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1593, in __getattr__ module = self._get_module(self._class_to_module[name]) File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1605, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback): /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I think it was caused by CUDA incompatibility between CUDA driver 535.183.06 and CUDA tookit 12.5 installed nativelly in your docker. @byshiue @kaiyux
pip installed a newer flash_attn solved the undefined flash attention symbol problem. flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
A new error comes:
oot@trtllm-devel:/workspace# python -c 'import tensorrt_llm'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1603, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 28, in <module>
from ...image_processing_utils import ImageProcessingMixin
File "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_utils.py", line 21, in <module>
from .image_transforms import center_crop, normalize, rescale
File "/usr/local/lib/python3.10/dist-packages/transformers/image_transforms.py", line 22, in <module>
from .image_utils import (
File "/usr/local/lib/python3.10/dist-packages/transformers/image_utils.py", line 58, in <module>
from torchvision.transforms import InterpolationMode
File "/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py", line 6, in <module>
from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
File "/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py", line 164, in <module>
def meta_nms(dets, scores, iou_threshold):
File "/usr/local/lib/python3.10/dist-packages/torch/library.py", line 654, in register
use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
File "/usr/local/lib/python3.10/dist-packages/torch/library.py", line 154, in _register_fake
handle = entry.abstract_impl.register(func_to_register, source)
File "/usr/local/lib/python3.10/dist-packages/torch/_library/abstract_impl.py", line 31, in register
if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist
After upgrade torchvision by pip install --upgrade torchvision
, I finally made it. :)
Mabye because of incompatibility of Numpy , I encountered an error in building stage:
trtllm-build --checkpoint_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/ \ root@trtllm-devel 07:46:54
--gemm_plugin float16 \
--output_dir ./bloom/560M/trt_engines/fp16/1-gpu/
[TensorRT-LLM] TensorRT-LLM version: 0.13.0.dev2024090300
[09/10/2024-07:47:07] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set gemm_plugin to float16.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set nccl_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set lookup_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set lora_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set moe_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set context_fmha to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set remove_input_padding to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set reduce_fusion to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set enable_xqa to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set tokens_per_block to 64.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set multiple_profiles to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set paged_state to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set streamingllm to False.
[09/10/2024-07:47:07] [TRT-LLM] [I] Set use_fused_mlp to True.
[09/10/2024-07:47:07] [TRT-LLM] [I] Compute capability: (8, 6)
[09/10/2024-07:47:07] [TRT-LLM] [I] SM count: 68
[09/10/2024-07:47:07] [TRT-LLM] [I] SM clock: 2100 MHz
[09/10/2024-07:47:07] [TRT-LLM] [I] int4 TFLOPS: 584
[09/10/2024-07:47:07] [TRT-LLM] [I] int8 TFLOPS: 292
[09/10/2024-07:47:07] [TRT-LLM] [I] fp8 TFLOPS: 0
[09/10/2024-07:47:07] [TRT-LLM] [I] float16 TFLOPS: 146
[09/10/2024-07:47:07] [TRT-LLM] [I] bfloat16 TFLOPS: 146
[09/10/2024-07:47:07] [TRT-LLM] [I] float32 TFLOPS: 73
[09/10/2024-07:47:07] [TRT-LLM] [I] Total Memory: 10 GiB
[09/10/2024-07:47:07] [TRT-LLM] [I] Memory clock: 9501 MHz
[09/10/2024-07:47:07] [TRT-LLM] [I] Memory bus width: 320
[09/10/2024-07:47:07] [TRT-LLM] [I] Memory bandwidth: 760 GB/s
[09/10/2024-07:47:07] [TRT-LLM] [I] PCIe speed: 2500 Mbps
[09/10/2024-07:47:07] [TRT-LLM] [I] PCIe link width: 16
[09/10/2024-07:47:07] [TRT-LLM] [I] PCIe bandwidth: 5 GB/s
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 510, in load
param.value = weights[name]
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 202, in value
dtype = np_dtype_to_trt(v.dtype)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 218, in np_dtype_to_trt
assert ret is not None, f'Unsupported dtype: {dtype}'
File "/usr/local/lib/python3.10/dist-packages/numpy/core/_dtype.py", line 42, in __str__
return dtype.name
File "/usr/local/lib/python3.10/dist-packages/numpy/core/_dtype.py", line 362, in _name_get
if _name_includes_bit_suffix(dtype):
File "/usr/local/lib/python3.10/dist-packages/numpy/core/_dtype.py", line 339, in _name_includes_bit_suffix
elif np.issubdtype(dtype, np.flexible) and _isunsized(dtype):
File "/usr/local/lib/python3.10/dist-packages/numpy/core/numerictypes.py", line 417, in issubdtype
arg1 = dtype(arg1).type
TypeError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/trtllm-build", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 520, in main
parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 384, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 351, in build_and_save
engine = build_model(build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 320, in build_model
model = model_cls.from_checkpoint(ckpt_dir, config=rank_config)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 485, in from_checkpoint
model.load(weights, from_pruned=is_checkpoint_pruned)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 512, in load
raise RuntimeError(
RuntimeError: Encounter error '' for parameter 'transformer.vocab_embedding.weight'
I'm using docker in rootless mode.
When i run
make -C docker run LOCAL_USER=1
, it failed with the following details:I have logged into docker on linux by using
docker login
, but the error still occurs. Does anyone know the reason?