Closed rbo closed 2 months ago
First, let's prepull the triton server:
root@terminator:~# podman pull nvcr.io/nvidia/tritonserver:24.06-py3
Trying to pull nvcr.io/nvidia/tritonserver:24.06-py3...
...
This should be in the new base image : #40
root@terminator:~# oc get svc -n triton
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
server ClusterIP 10.43.222.104 <none> 8000/TCP,8001/TCP,8002/TCP 21m
root@terminator:~# podman run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.06-py3-sdk
=================================
== Triton Inference Server SDK ==
=================================
NVIDIA Release 24.06 (build 98458819)
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
root@terminator:/workspace# /workspace/install/bin/image_client -u 10.43.222.104:8000 -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
expecting input to have 3 dimensions, model 'densenet_onnx' input has 4
root@terminator:/workspace#
Some config files are missing, added with https://github.com/cloud-native-robotz-hackathon/robot-gitops/commit/b5cd3b773ea461c29d9bd5e849e520c57f164fac
Works
root@terminator:/workspace# /workspace/install/bin/image_client -u 10.43.222.104:8000 -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
15.349568 (504) = COFFEE MUG
13.227468 (968) = CUP
10.424893 (505) = COFFEEPOT
root@terminator:/workspace#
Switch to fedora model after https://github.com/cloud-native-robotz-hackathon/infrastructure/issues/48
Log output of the server right now with model from https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx
Defaulted container "triton" out of: triton, model-downloder (init)
W0719 15:28:12.834580 1 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0719 15:28:12.834756 1 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0719 15:28:12.835134 1 server.cc:241] "CudaDriverHelper has not been initialized."
I0719 15:28:12.842022 1 model_lifecycle.cc:472] "loading: densenet_onnx:1"
I0719 15:28:12.853726 1 onnxruntime.cc:2899] "TRITONBACKEND_Initialize: onnxruntime"
I0719 15:28:12.853961 1 onnxruntime.cc:2909] "Triton TRITONBACKEND API version: 1.19"
I0719 15:28:12.853993 1 onnxruntime.cc:2915] "'onnxruntime' TRITONBACKEND API version: 1.19"
I0719 15:28:12.854016 1 onnxruntime.cc:2945] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0719 15:28:12.916288 1 onnxruntime.cc:3010] "TRITONBACKEND_ModelInitialize: densenet_onnx (version 1)"
I0719 15:28:12.918189 1 onnxruntime.cc:983] "skipping model configuration auto-complete for 'densenet_onnx': inputs and outputs already specified"
I0719 15:28:12.921052 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_0 (CPU device 0)"
I0719 15:28:12.922446 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_1 (CPU device 0)"
I0719 15:28:14.624047 1 model_lifecycle.cc:838] "successfully loaded 'densenet_onnx'"
I0719 15:28:14.624407 1 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0719 15:28:14.624988 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0719 15:28:14.625497 1 server.cc:674]
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| densenet_onnx | 1 | READY |
+---------------+---------+--------+
Error: Failed to initialize NVML
W0719 15:28:14.629104 1 metrics.cc:798] "DCGM unable to start: DCGM initialization error"
I0719 15:28:14.629566 1 metrics.cc:770] "Collecting CPU metrics"
I0719 15:28:14.629862 1 tritonserver.cc:2579]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.47.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0719 15:28:14.634715 1 grpc_server.cc:2463] "Started GRPCInferenceService at 0.0.0.0:8001"
I0719 15:28:14.635473 1 http_server.cc:4692] "Started HTTPService at 0.0.0.0:8000"
I0719 15:28:14.681417 1 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"
Folder structure:
root@triton-669c9fcdd7-jc58j:/models# find /models/ -ls
439799 4 drwxrwxrwx 3 root root 4096 Jul 19 15:27 /models/
439809 4 drwxr-xr-x 3 root root 4096 Jul 19 15:27 /models/densenet_onnx
426421 4 -rw-r--r-- 1 root root 387 Jul 19 15:27 /models/densenet_onnx/config.pbtxt
426422 12 -rw-r--r-- 1 root root 10311 Jul 19 15:27 /models/densenet_onnx/densenet_labels.txt
439810 4 drwxr-xr-x 2 root root 4096 Jul 19 15:27 /models/densenet_onnx/1
426423 31960 -rw-r--r-- 1 root root 32719461 Jul 25 06:33 /models/densenet_onnx/1/model.onnx
root@triton-669c9fcdd7-jc58j:/models#
That looks better:
oc logs triton-5c5d6c47fb-c9rmz ─╯
Defaulted container "triton" out of: triton, model-container (init)
W0725 07:15:40.561695 1 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0725 07:15:40.562436 1 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0725 07:15:40.563120 1 server.cc:241] "CudaDriverHelper has not been initialized."
I0725 07:15:40.565240 1 model_lifecycle.cc:472] "loading: robot_onnx:1"
I0725 07:15:40.572049 1 onnxruntime.cc:2899] "TRITONBACKEND_Initialize: onnxruntime"
I0725 07:15:40.572153 1 onnxruntime.cc:2909] "Triton TRITONBACKEND API version: 1.19"
I0725 07:15:40.572177 1 onnxruntime.cc:2915] "'onnxruntime' TRITONBACKEND API version: 1.19"
I0725 07:15:40.572200 1 onnxruntime.cc:2945] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0725 07:15:40.629397 1 onnxruntime.cc:3010] "TRITONBACKEND_ModelInitialize: robot_onnx (version 1)"
I0725 07:15:41.358227 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: robot_onnx_0 (CPU device 0)"
I0725 07:15:41.358357 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: robot_onnx_1 (CPU device 0)"
I0725 07:15:42.876214 1 model_lifecycle.cc:838] "successfully loaded 'robot_onnx'"
I0725 07:15:42.877624 1 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0725 07:15:42.878535 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0725 07:15:42.879925 1 server.cc:674]
+------------+---------+--------+
| Model | Version | Status |
+------------+---------+--------+
| robot_onnx | 1 | READY |
+------------+---------+--------+
Error: Failed to initialize NVML
W0725 07:15:42.885739 1 metrics.cc:798] "DCGM unable to start: DCGM initialization error"
I0725 07:15:42.886789 1 metrics.cc:770] "Collecting CPU metrics"
I0725 07:15:42.887768 1 tritonserver.cc:2579]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.47.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0725 07:15:42.894747 1 grpc_server.cc:2463] "Started GRPCInferenceService at 0.0.0.0:8001"
I0725 07:15:42.896495 1 http_server.cc:4692] "Started HTTPService at 0.0.0.0:8000"
I0725 07:15:42.945008 1 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"
New Containerfile of model container image:
FROM registry.access.redhat.com/ubi9/ubi-micro:latest
RUN mkdir -p /models/robot_onnx/1/
ADD ./model.onnx /models/robot_onnx/1/
ENV TARGET_FILE_LOCATION=/dst/
CMD cp -rvp /models/* $TARGET_FILE_LOCATION
@nexus-Six Do you have any example how to test the triton server?
Details are here : https://github.com/cloud-native-robotz-hackathon/devel-bucket/blob/master/docs/triton-setup-robot.md
Try to test it - FAILED
% ssh -l root terminator.robot.lan
% podman run -ti nvcr.io/nvidia/tritonserver:23.10-py3-sdk bash
% curl -s http://terminator.robot.lan:5000/camera | base64 -d > camera.jpg
% /workspace/install/bin/image_client -u server.triton.svc.cluster.local:8000 -m densenet_onnx -c 3 -s INCEPTION camera.jpg
expecting model output to be a vector
Model tested with https://github.com/cloud-native-robotz-hackathon/human-driver-webapp