unable to build bert model

riyaj8888 commented 10 months ago

/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//10.211.0.1'), PosixPath('tcp'), PosixPath('443')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('tcp'), PosixPath('8000'), PosixPath('//10.211.133.139')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//console.elementai.com'), PosixPath('https')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) [12/28/2023-08:50:07] [TRT-LLM] [W] A required package 'psutil' is not installed. Will not monitor the device memory usages. Please install the package first, e.g, 'pip install pynvml>=11.5.0'. Traceback (most recent call last): File "/app/snow.riyaj_atar.llm_exp_ds/latency_benchmarks/riyaj/TensorRT-LLM/examples/bert/build.py", line 25, in import tensorrt_llm File "/tmp/.local/lib/python3.10/site-packages/tensorrt_llm/init.py", line 61, in _init(log_level="error") File "/tmp/.local/lib/python3.10/site-packages/tensorrt_llm/_common.py", line 47, in _init _load_plugin_lib() File "/tmp/.local/lib/python3.10/site-packages/tensorrt_llm/plugin/plugin.py", line 34, in _load_plugin_lib handle = ctypes.CDLL(plugin_lib_path(), File "/opt/conda/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: libmpi.so.40: cannot open shared object file: No such file or directory

jdemouth-nvidia commented 10 months ago

Hi - can you share the command lines for us to reproduce the issue, please?

symphonylyh commented 10 months ago

@riyaj8888 have you tried the docker setup guide: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md? It seems you were in a conda environment with some missing cuda dependencies

riyaj8888 commented 10 months ago

Here I have tried with fresh image.

Steps I have followed:

Pull image 23.10-py3
git clone https://github.com/NVIDIA/TensorRT-LLM.git cd TensorRT-LLM git submodule update --init --recursive git lfs install git lfs pull 3 python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt

here it's throwing error

pip install ./build/tensorrt_llm*.whl

Here I m getting error

riyaj8888 commented 10 months ago

Successfully installed Pillow-10.1.0 absl-py-2.0.0 accelerate-0.20.3 aiohttp-3.9.1 aiosignal-1.3.1 async-timeout-4.0.3 build-1.0.3 cfgv-3.4.0 click-8.1.7 colored-2.2.4 coloredlogs-15.0.1 coverage-7.4.0 cuda-python-12.2.0 cython-3.0.7 datasets-2.16.0 diffusers-0.15.0 dill-0.3.7 distlib-0.3.8 einops-0.7.0 evaluate-0.4.1 execnet-2.0.2 filelock-3.13.1 frozenlist-1.4.1 fsspec-2023.10.0 graphviz-0.20.1 huggingface-hub-0.20.1 humanfriendly-10.0 identify-2.5.33 iniconfig-2.0.0 joblib-1.3.2 lark-1.1.8 mpi4py-3.1.5 mpmath-1.3.0 multidict-6.0.4 multiprocess-0.70.15 mypy-1.8.0 mypy-extensions-1.0.0 networkx-3.2.1 nltk-3.8.1 nodeenv-1.8.0 onnx-1.15.0 optimum-1.16.1 pandas-2.1.4 parameterized-0.9.0 pluggy-1.3.0 polygraphy-0.49.0 pre-commit-3.6.0 protobuf-4.25.1 py-1.11.0 pyarrow-14.0.2 pyarrow-hotfix-0.6 pybind11-stubgen-2.4.2 pynvml-11.5.0 pyproject_hooks-1.0.0 pytest-7.4.3 pytest-cov-4.1.0 pytest-forked-1.6.0 pytest-xdist-3.5.0 pytz-2023.3.post1 regex-2023.12.25 responses-0.18.0 rouge_score-0.1.2 safetensors-0.4.1 sentencepiece-0.1.99 sympy-1.12 tensorrt-9.2.0.post12.dev5 tokenizers-0.13.3 torch-2.1.2+cu121 tqdm-4.66.1 transformers-4.33.1 triton-2.1.0 typing-extensions-4.8.0 tzdata-2023.3 virtualenv-20.25.0 xxhash-3.4.1 yarl-1.9.4

[notice] A new release of pip is available: 23.3 -> 23.3.2 [notice] To update, run: python3 -m pip install --upgrade pip /bin/bash: line 1: cmake: command not found Traceback (most recent call last): File "/app/trt_engine/TensorRT-LLM/./scripts/build_wheel.py", line 306, in main(**vars(args)) File "/app/trt_engine/TensorRT-LLM/./scripts/build_wheel.py", line 160, in main build_run( File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'cmake -DCMAKE_BUILD_TYPE="Release" -DBUILD_PYT="ON" -DBUILD_PYBIND="OFF" -DTRT_LIB_DIR=/usr/local/tensorrt/targets/x86_64-linux-gnu/lib -DTRT_INCLUDE_DIR=/usr/local/tensorrt/include -S "/app/trt_engine/TensorRT-LLM/cpp"' returned non-zero exit status 127.

riyaj8888 commented 10 months ago

hey hi,

any update on above error

symphonylyh commented 10 months ago

@riyaj8888 might be a naive question, but did you really launch the pull docker images first and then doing the steps inside the docker (for both building TRT-LLM source code & pip install TRT-LLM)?

All the paths in your above error message indicate that you're not inside a docker container

hello-11 commented 1 day ago

@riyaj8888 Do you still have the problem? If not, we will close it soon.

NVIDIA / TensorRT-LLM

unable to build bert model #762

here it's throwing error