For simplify, I only keep 1 model (DashcamNet which previous auto downloaded by start_server.sh) under folder: model_repository and comment out some wget from start_server.sh.
When running the start_server.sh, I noticed below memory low warnings:
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
but I was check by nvidia-smi, can see the GPU only used less than 2G memory (10G are available).
and finally the starting was failed, the whole logs from console:
(triton_dev) kevin@kevin-LEGION-REN7000K-26IOB:~/tao-toolkit-triton-apps$ sudo -E scripts/start_server.sh
[sudo] password for kevin:
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/kevin/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
Sending build context to Docker daemon 625.1MB
Step 1/7 : FROM nvcr.io/nvidia/tritonserver:21.10-py3
---> 5c99e9b6586e
Step 2/7 : RUN wget https://nvidia.box.com/shared/static/7u2ocnwenwgrsx1yq8vv4hkfr0dg1rtm -O /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.0.3
---> Using cache
---> c2d30fd54a9c
Step 3/7 : ENV TRT_LIB_PATH=/usr/lib/x86_64-linux-gnu
---> Using cache
---> b45dff5cb3c9
Step 4/7 : ENV TRT_INC_PATH=/usr/include/x86_64-linux-gnu
---> Using cache
---> 66edc8a23083
Step 5/7 : RUN wget https://developer.nvidia.com/tao-converter-80 -P /opt/tao-converter && apt-get update && apt-get install unzip libssl-dev -y && unzip /opt/tao-converter/tao-converter-80 -d /opt/tao-converter && chmod +x /opt/tao-converter/tao-converter-x86-tensorrt8.0/tao-converter
---> Using cache
---> ea919493a951
Step 6/7 : ENV PATH=/opt/tao-converter/tao-converter-x86-tensorrt8.0:$PATH
---> Using cache
---> 0b3c58f2bd0b
Step 7/7 : CMD ["/bin/bash"]
---> Using cache
---> f33160171d35
Successfully built f33160171d35
Successfully tagged nvcr.io/nvidia/tao/triton-apps:21.11-py3
mkdir: cannot create directory ‘/home/kevin/tao-toolkit-triton-apps/tao_models’: File exists
Running the server on 0
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 21.10 (build 28453983)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
Converting the DashcamNet model
[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 595 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 554 MiB, GPU 595 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[WARNING] Missing scale and zero-point for tensor output_bbox/bias, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor conv1/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor conv1/bias, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/moving_variance, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/Reshape_1/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/batchnorm/add/y, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/gam
...
...
...
WARNING] Missing scale and zero-point for tensor block_4b_bn_2/gamma, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor block_4b_bn_2/Reshape_3/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor block_4b_bn_2/beta, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor block_4b_bn_2/Reshape_2/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor block_4b_bn_2/moving_mean, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor block_4b_bn_2/Reshape/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor output_bbox/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor output_cov/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor output_cov/bias, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +791, GPU +340, now: CPU 1351, GPU 935 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +196, GPU +342, now: CPU 1547, GPU 1277 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
^[[1;5H^[[1;5H
^[[1;5H[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 52032
[INFO] Total Device Persistent Memory: 4253184
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 8 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2527, GPU 1757 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2527, GPU 1765 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2527, GPU 1749 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2527, GPU 1731 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2520 MiB, GPU 1731 MiB
I0312 10:55:12.094442 57 metrics.cc:298] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060
I0312 10:55:12.255036 57 libtorch.cc:1092] TRITONBACKEND_Initialize: pytorch
I0312 10:55:12.255052 57 libtorch.cc:1102] Triton TRITONBACKEND API version: 1.6
I0312 10:55:12.255055 57 libtorch.cc:1108] 'pytorch' TRITONBACKEND API version: 1.6
2022-03-12 10:55:12.440666: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0312 10:55:12.466952 57 tensorflow.cc:2170] TRITONBACKEND_Initialize: tensorflow
I0312 10:55:12.466977 57 tensorflow.cc:2180] Triton TRITONBACKEND API version: 1.6
I0312 10:55:12.466980 57 tensorflow.cc:2186] 'tensorflow' TRITONBACKEND API version: 1.6
I0312 10:55:12.466983 57 tensorflow.cc:2210] backend configuration:
{}
I0312 10:55:12.484855 57 onnxruntime.cc:1999] TRITONBACKEND_Initialize: onnxruntime
I0312 10:55:12.484881 57 onnxruntime.cc:2009] Triton TRITONBACKEND API version: 1.6
I0312 10:55:12.484884 57 onnxruntime.cc:2015] 'onnxruntime' TRITONBACKEND API version: 1.6
I0312 10:55:12.496852 57 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0312 10:55:12.496866 57 openvino.cc:1203] Triton TRITONBACKEND API version: 1.6
I0312 10:55:12.496869 57 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.6
I0312 10:55:12.612994 57 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6252000000' with size 268435456
I0312 10:55:12.613136 57 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
E0312 10:55:12.613955 57 model_repository_manager.cc:1890] Poll failed for model directory 'lprnet_tao': failed to open text file for read /model_repository/lprnet_tao/config.pbtxt: No such file or directory
E0312 10:55:12.613987 57 model_repository_manager.cc:1890] Poll failed for model directory 'multitask_classification_tao': failed to open text file for read /model_repository/multitask_classification_tao/config.pbtxt: No such file or directory
E0312 10:55:12.614011 57 model_repository_manager.cc:1890] Poll failed for model directory 'peoplenet_tao': failed to open text file for read /model_repository/peoplenet_tao/config.pbtxt: No such file or directory
E0312 10:55:12.614032 57 model_repository_manager.cc:1890] Poll failed for model directory 'peoplesegnet_tao': failed to open text file for read /model_repository/peoplesegnet_tao/config.pbtxt: No such file or directory
E0312 10:55:12.614054 57 model_repository_manager.cc:1890] Poll failed for model directory 'retinanet_tao': failed to open text file for read /model_repository/retinanet_tao/config.pbtxt: No such file or directory
E0312 10:55:12.614089 57 model_repository_manager.cc:1890] Poll failed for model directory 'vehicletypenet_tao': failed to open text file for read /model_repository/vehicletypenet_tao/config.pbtxt: No such file or directory
E0312 10:55:12.614111 57 model_repository_manager.cc:1890] Poll failed for model directory 'yolov3_tao': failed to open text file for read /model_repository/yolov3_tao/config.pbtxt: No such file or directory
I0312 10:55:12.614145 57 model_repository_manager.cc:1022] loading: dashcamnet_tao:1
I0312 10:55:12.733795 57 tensorrt.cc:4925] TRITONBACKEND_Initialize: tensorrt
I0312 10:55:12.733819 57 tensorrt.cc:4935] Triton TRITONBACKEND API version: 1.6
I0312 10:55:12.733822 57 tensorrt.cc:4941] 'tensorrt' TRITONBACKEND API version: 1.6
I0312 10:55:12.733893 57 tensorrt.cc:4984] backend configuration:
{}
I0312 10:55:12.734092 57 tensorrt.cc:5036] TRITONBACKEND_ModelInitialize: dashcamnet_tao (version 1)
I0312 10:55:12.735397 57 tensorrt.cc:5085] TRITONBACKEND_ModelInstanceInitialize: dashcamnet_tao (GPU device 0)
I0312 10:55:13.097851 57 logging.cc:49] [MemUsageChange] Init CUDA: CPU +525, GPU +0, now: CPU 648, GPU 653 (MiB)
I0312 10:55:13.101456 57 logging.cc:49] Loaded engine size: 4 MB
I0312 10:55:13.101539 57 logging.cc:49] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 656 MiB, GPU 653 MiB
I0312 10:55:13.591324 57 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +791, GPU +340, now: CPU 1448, GPU 999 (MiB)
I0312 10:55:14.000462 57 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +195, GPU +342, now: CPU 1643, GPU 1341 (MiB)
I0312 10:55:14.001389 57 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1643, GPU 1323 (MiB)
I0312 10:55:14.001430 57 logging.cc:49] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1643 MiB, GPU 1323 MiB
I0312 10:55:14.001566 57 logging.cc:49] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1635 MiB, GPU 1323 MiB
I0312 10:55:14.001878 57 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1635, GPU 1333 (MiB)
I0312 10:55:14.002592 57 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1635, GPU 1341 (MiB)
I0312 10:55:14.003012 57 logging.cc:49] [MemUsageSnapshot] ExecutionContext creation end: CPU 1635 MiB, GPU 1487 MiB
I0312 10:55:14.003266 57 tensorrt.cc:1379] Created instance dashcamnet_tao on GPU 0 with stream priority 0
I0312 10:55:14.003381 57 model_repository_manager.cc:1183] successfully loaded 'dashcamnet_tao' version 1
I0312 10:55:14.003466 57 server.cc:522]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0312 10:55:14.003501 57 server.cc:549]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I0312 10:55:14.003520 57 server.cc:592]
+----------------+---------+--------+
| Model | Version | Status |
+----------------+---------+--------+
| dashcamnet_tao | 1 | READY |
+----------------+---------+--------+
I0312 10:55:14.003586 57 tritonserver.cc:1920]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.15.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_share |
| | d_memory binary_tensor_data statistics |
| model_repository_path[0] | /model_repository |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
I0312 10:55:14.003603 57 server.cc:252] Waiting for in-flight requests to complete.
I0312 10:55:14.003606 57 model_repository_manager.cc:1055] unloading: dashcamnet_tao:1
I0312 10:55:14.003635 57 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0312 10:55:14.003687 57 tensorrt.cc:5123] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0312 10:55:14.007162 57 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1635, GPU 1463 (MiB)
I0312 10:55:14.007364 57 tensorrt.cc:5062] TRITONBACKEND_ModelFinalize: delete model state
I0312 10:55:14.008953 57 model_repository_manager.cc:1166] successfully unloaded 'dashcamnet_tao' version 1
I0312 10:55:15.003795 57 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
I'm using Ubuntu 20, x64, RTX 3060(12G).
For simplify, I only keep 1 model (DashcamNet which previous auto downloaded by
start_server.sh
) under folder:model_repository
and comment out somewget
fromstart_server.sh
.When running the
start_server.sh
, I noticed below memory low warnings:[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
but I was check by
nvidia-smi
, can see the GPU only used less than 2G memory (10G are available).and finally the starting was failed, the whole logs from console:
could you help to check?