TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
System Info
tensorRT-llm 0.8
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ --gpus=all \ --volume /home/wanghaikuan/code/TensorRT-LLM:/code/tensorrt_llm \ --env "CCACHE_DIR=/code/tensorrt_llm/cpp/.ccache" \ --env "CCACHE_BASEDIR=/code/tensorrt_llm" \ --workdir /app/tensorrt_llm \ --hostname nvidia-desktop-release \ --name tensorrt_llm-release-wanghaikuan \ --tmpfs /tmp:exec \ tensorrt_llm/release:latest
Expected behavior
orin platform can enter docker successfully.
actual behavior
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'csv' invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.: unknown.
additional notes
NONE