cloud-native-robotz-hackathon / infrastructure

0 stars 1 forks source link

Setup Nvidia Triton Inferencing Server on GoPiGo 3 (MicroShift + GitOps) #35

Closed rbo closed 2 months ago

rbo commented 3 months ago
rbo commented 3 months ago

First, let's prepull the triton server:

root@terminator:~# podman pull nvcr.io/nvidia/tritonserver:24.06-py3
Trying to pull nvcr.io/nvidia/tritonserver:24.06-py3...
...

This should be in the new base image : #40

rbo commented 3 months ago
root@terminator:~# oc get svc -n triton
NAME     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
server   ClusterIP   10.43.222.104   <none>        8000/TCP,8001/TCP,8002/TCP   21m
root@terminator:~# podman run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.06-py3-sdk

=================================
== Triton Inference Server SDK ==
=================================

NVIDIA Release 24.06 (build 98458819)

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

root@terminator:/workspace# /workspace/install/bin/image_client -u 10.43.222.104:8000 -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
expecting input to have 3 dimensions, model 'densenet_onnx' input has 4
root@terminator:/workspace# 
rbo commented 3 months ago

Some config files are missing, added with https://github.com/cloud-native-robotz-hackathon/robot-gitops/commit/b5cd3b773ea461c29d9bd5e849e520c57f164fac

rbo commented 3 months ago

Works

root@terminator:/workspace# /workspace/install/bin/image_client -u 10.43.222.104:8000 -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
    15.349568 (504) = COFFEE MUG
    13.227468 (968) = CUP
    10.424893 (505) = COFFEEPOT
root@terminator:/workspace# 
rbo commented 3 months ago

Switch to fedora model after https://github.com/cloud-native-robotz-hackathon/infrastructure/issues/48

rbo commented 2 months ago

Log output of the server right now with model from https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx

Defaulted container "triton" out of: triton, model-downloder (init)
W0719 15:28:12.834580 1 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0719 15:28:12.834756 1 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0719 15:28:12.835134 1 server.cc:241] "CudaDriverHelper has not been initialized."
I0719 15:28:12.842022 1 model_lifecycle.cc:472] "loading: densenet_onnx:1"
I0719 15:28:12.853726 1 onnxruntime.cc:2899] "TRITONBACKEND_Initialize: onnxruntime"
I0719 15:28:12.853961 1 onnxruntime.cc:2909] "Triton TRITONBACKEND API version: 1.19"
I0719 15:28:12.853993 1 onnxruntime.cc:2915] "'onnxruntime' TRITONBACKEND API version: 1.19"
I0719 15:28:12.854016 1 onnxruntime.cc:2945] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0719 15:28:12.916288 1 onnxruntime.cc:3010] "TRITONBACKEND_ModelInitialize: densenet_onnx (version 1)"
I0719 15:28:12.918189 1 onnxruntime.cc:983] "skipping model configuration auto-complete for 'densenet_onnx': inputs and outputs already specified"
I0719 15:28:12.921052 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_0 (CPU device 0)"
I0719 15:28:12.922446 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_1 (CPU device 0)"
I0719 15:28:14.624047 1 model_lifecycle.cc:838] "successfully loaded 'densenet_onnx'"
I0719 15:28:14.624407 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0719 15:28:14.624988 1 server.cc:631] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0719 15:28:14.625497 1 server.cc:674] 
+---------------+---------+--------+
| Model         | Version | Status |
+---------------+---------+--------+
| densenet_onnx | 1       | READY  |
+---------------+---------+--------+

Error: Failed to initialize NVML
W0719 15:28:14.629104 1 metrics.cc:798] "DCGM unable to start: DCGM initialization error"
I0719 15:28:14.629566 1 metrics.cc:770] "Collecting CPU metrics"
I0719 15:28:14.629862 1 tritonserver.cc:2579] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.47.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0719 15:28:14.634715 1 grpc_server.cc:2463] "Started GRPCInferenceService at 0.0.0.0:8001"
I0719 15:28:14.635473 1 http_server.cc:4692] "Started HTTPService at 0.0.0.0:8000"
I0719 15:28:14.681417 1 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"

Folder structure:

root@triton-669c9fcdd7-jc58j:/models# find  /models/ -ls
   439799      4 drwxrwxrwx   3 root     root         4096 Jul 19 15:27 /models/
   439809      4 drwxr-xr-x   3 root     root         4096 Jul 19 15:27 /models/densenet_onnx
   426421      4 -rw-r--r--   1 root     root          387 Jul 19 15:27 /models/densenet_onnx/config.pbtxt
   426422     12 -rw-r--r--   1 root     root        10311 Jul 19 15:27 /models/densenet_onnx/densenet_labels.txt
   439810      4 drwxr-xr-x   2 root     root         4096 Jul 19 15:27 /models/densenet_onnx/1
   426423  31960 -rw-r--r--   1 root     root     32719461 Jul 25 06:33 /models/densenet_onnx/1/model.onnx
root@triton-669c9fcdd7-jc58j:/models#
rbo commented 2 months ago

That looks better:

oc logs triton-5c5d6c47fb-c9rmz                                                                                                    ─╯
Defaulted container "triton" out of: triton, model-container (init)
W0725 07:15:40.561695 1 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0725 07:15:40.562436 1 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0725 07:15:40.563120 1 server.cc:241] "CudaDriverHelper has not been initialized."
I0725 07:15:40.565240 1 model_lifecycle.cc:472] "loading: robot_onnx:1"
I0725 07:15:40.572049 1 onnxruntime.cc:2899] "TRITONBACKEND_Initialize: onnxruntime"
I0725 07:15:40.572153 1 onnxruntime.cc:2909] "Triton TRITONBACKEND API version: 1.19"
I0725 07:15:40.572177 1 onnxruntime.cc:2915] "'onnxruntime' TRITONBACKEND API version: 1.19"
I0725 07:15:40.572200 1 onnxruntime.cc:2945] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0725 07:15:40.629397 1 onnxruntime.cc:3010] "TRITONBACKEND_ModelInitialize: robot_onnx (version 1)"
I0725 07:15:41.358227 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: robot_onnx_0 (CPU device 0)"
I0725 07:15:41.358357 1 onnxruntime.cc:3075] "TRITONBACKEND_ModelInstanceInitialize: robot_onnx_1 (CPU device 0)"
I0725 07:15:42.876214 1 model_lifecycle.cc:838] "successfully loaded 'robot_onnx'"
I0725 07:15:42.877624 1 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0725 07:15:42.878535 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0725 07:15:42.879925 1 server.cc:674]
+------------+---------+--------+
| Model      | Version | Status |
+------------+---------+--------+
| robot_onnx | 1       | READY  |
+------------+---------+--------+

Error: Failed to initialize NVML
W0725 07:15:42.885739 1 metrics.cc:798] "DCGM unable to start: DCGM initialization error"
I0725 07:15:42.886789 1 metrics.cc:770] "Collecting CPU metrics"
I0725 07:15:42.887768 1 tritonserver.cc:2579]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.47.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0725 07:15:42.894747 1 grpc_server.cc:2463] "Started GRPCInferenceService at 0.0.0.0:8001"
I0725 07:15:42.896495 1 http_server.cc:4692] "Started HTTPService at 0.0.0.0:8000"
I0725 07:15:42.945008 1 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"

New Containerfile of model container image:

FROM registry.access.redhat.com/ubi9/ubi-micro:latest
RUN mkdir -p /models/robot_onnx/1/
ADD ./model.onnx /models/robot_onnx/1/
ENV TARGET_FILE_LOCATION=/dst/
CMD cp -rvp /models/* $TARGET_FILE_LOCATION
rbo commented 2 months ago
rbo commented 2 months ago

@nexus-Six Do you have any example how to test the triton server?

rbo commented 2 months ago

Details are here : https://github.com/cloud-native-robotz-hackathon/devel-bucket/blob/master/docs/triton-setup-robot.md

Try to test it - FAILED

% ssh -l root terminator.robot.lan
% podman run -ti nvcr.io/nvidia/tritonserver:23.10-py3-sdk bash
% curl -s  http://terminator.robot.lan:5000/camera | base64 -d > camera.jpg
% /workspace/install/bin/image_client -u server.triton.svc.cluster.local:8000 -m densenet_onnx -c 3 -s INCEPTION camera.jpg
expecting model output to be a vector
rbo commented 2 months ago

Model tested with https://github.com/cloud-native-robotz-hackathon/human-driver-webapp