FastDeploy使用trt推理与原yolov5 6.1 版本trt加速冲突

LiQiang0307 commented 1 year ago

环境

【FastDeploy版本】： fastdeploy-gpu-python-1.0.1
【系统平台】: Windows x64(Windows10)
【硬件】：TITAN X ， CUDA 11.3 CUDNN 8.2 Tensorrt版本8.2.2.1与8.5.1.7（都会出现如下问题）
【编译语言】：Python3.8等

[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
Loading crowdhuman_yolov5m.engine for TensorRT inference...
[03/07/2023-19:58:45] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

[03/07/2023-19:58:45] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 15888, GPU 4476 (MiB)
[03/07/2023-19:58:45] [TRT] [I] Loaded engine size: 109 MiB
[03/07/2023-19:58:45] [TRT] [E] 1: [stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::40] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 213, Serialized Engine Version: 232)
[03/07/2023-19:58:45] [TRT] [E] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
None
Traceback (most recent call last):
  File "app/app.py", line 97, in <module>
    yolo_model = DetectMultiBackend('crowdhuman_yolov5m.engine', device=device)  # 👨‍🚀
  File "C:\Users/Administrator/Desktop/airun_project\yolov5\models\common.py", line 389, in __init__
    context = model.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

LiQiang0307 commented 1 year ago

如果先初始化yolov5 trt模型，Fastdeploy 报如下错误信息：

[FastDeploy][INFO]:  Successfully found CUDA ToolKit from system PATH env -> D:\Anaconda3\envs\ai_run\Library\bin
[ERROR] fastdeploy/backends/tensorrt/trt_backend.cc(238)::fastdeploy::FDTrtLogger::log  2: [builder.cpp::nvinfer1::builder::createCaskKernelLibraryImpl::157] Error Code 2: Internal Error (Assertion validateCaskKLibSize(buffer.size) failed. )
[ERROR] fastdeploy/backends/tensorrt/trt_backend.cc(615)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx       Failed to call createInferBuilder().
[ERROR] fastdeploy/backends/tensorrt/trt_backend.cc(263)::fastdeploy::TrtBackend::InitFromOnnx  Failed to create tensorrt engine.
[ERROR] fastdeploy/runtime.cc(768)::fastdeploy::Runtime::CreateTrtBackend       Load model from Paddle failed while initliazing TrtBackend.

jiangjiajun commented 1 year ago

你的系统环境中有安装一个TensorRT 8.5，并export了LD_LIBRARY_PATH到那个路径下吗？

LiQiang0307 commented 1 year ago

你的系统环境中有安装一个TensorRT 8.5，并export了LD_LIBRARY_PATH到那个路径下吗？

对，系统环境中安装了TensoRT8.5

jiangjiajun commented 1 year ago

FastDeploy里面依赖的是TensorRT 8.5.2.2，你能试下升级环境的TensorRT到这个版本吗

LiQiang0307 commented 1 year ago

好的，我试一下这个版本。

LiQiang0307 commented 1 year ago

1. 安装TensorRT8.5.2.2，并配置到环境变量

(ai_run) PS D:\TensorRT-8.5.2.2\python> ls

    目录: D:\TensorRT-8.5.2.2\python

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2022/12/6      2:59         679604 tensorrt-8.5.2.2-cp310-none-win_amd64.whl
-a----         2022/12/6      2:59         681551 tensorrt-8.5.2.2-cp36-none-win_amd64.whl
-a----         2022/12/6      2:59         681773 tensorrt-8.5.2.2-cp37-none-win_amd64.whl
-a----         2022/12/6      2:59         679845 tensorrt-8.5.2.2-cp38-none-win_amd64.whl
-a----         2022/12/6      2:59         679891 tensorrt-8.5.2.2-cp39-none-win_amd64.whl

(ai_run) PS D:\TensorRT-8.5.2.2\python> pip install .\tensorrt-8.5.2.2-cp38-none-win_amd64.whl
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://pypi.ngc.nvidia.com
Processing d:\tensorrt-8.5.2.2\python\tensorrt-8.5.2.2-cp38-none-win_amd64.whl
Installing collected packages: tensorrt
  Attempting uninstall: tensorrt
    Found existing installation: tensorrt 8.5.1.7
    Uninstalling tensorrt-8.5.1.7:
      Successfully uninstalled tensorrt-8.5.1.7
Successfully installed tensorrt-8.5.2.2

2. 重新导出yolov5 模型

(ai_run) PS C:\Users\Administrator\Desktop\airun_project\export_engine\yolov5-6.1> python export.py --weights crowdhuman_yolov5m.pt --include engine --device 0
export: data=data\coco128.yaml, weights=['crowdhuman_yolov5m.pt'], imgsz=[640, 640], batch_size=1, device=0, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine']
YOLOv5  v1.0-0-g180de10 torch 1.12.1+cu116 CUDA:0 (NVIDIA TITAN X (Pascal), 12288MiB)

Fusing layers...
Model Summary: 308 layers, 21041679 parameters, 0 gradients

PyTorch: starting from crowdhuman_yolov5m.pt with output shape (1, 25200, 7) (169.0 MB)

ONNX: starting export with onnx 1.13.0...
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
ONNX: export success, saved as crowdhuman_yolov5m.onnx (84.6 MB)

TensorRT: starting export with TensorRT 8.5.2.2...
[03/08/2023-16:23:05] [TRT] [I] [MemUsageChange] Init CUDA: CPU +189, GPU +0, now: CPU 15121, GPU 1990 (MiB)
[03/08/2023-16:23:07] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +124, GPU +22, now: CPU 15711, GPU 2012 (MiB)
[03/08/2023-16:23:07] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
export.py:220: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = workspace * 1 << 30
[03/08/2023-16:23:07] [TRT] [I] ----------------------------------------------------------------
[03/08/2023-16:23:07] [TRT] [I] Input filename:   crowdhuman_yolov5m.onnx
[03/08/2023-16:23:07] [TRT] [I] ONNX IR version:  0.0.7
[03/08/2023-16:23:07] [TRT] [I] Opset version:    13
[03/08/2023-16:23:07] [TRT] [I] Producer name:    pytorch
[03/08/2023-16:23:07] [TRT] [I] Producer version: 1.12.1
[03/08/2023-16:23:07] [TRT] [I] Domain:
[03/08/2023-16:23:07] [TRT] [I] Model version:    0
[03/08/2023-16:23:07] [TRT] [I] Doc string:
[03/08/2023-16:23:07] [TRT] [I] ----------------------------------------------------------------
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/08/2023-16:23:07] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
TensorRT: Network Description:
TensorRT:       input "images" with shape (1, 3, 640, 640) and dtype DataType.FLOAT
TensorRT:       output "output" with shape (1, 25200, 7) and dtype DataType.FLOAT
TensorRT:       output "onnx::Sigmoid_531" with shape (1, 3, 80, 80, 7) and dtype DataType.FLOAT
TensorRT:       output "onnx::Sigmoid_597" with shape (1, 3, 40, 40, 7) and dtype DataType.FLOAT
TensorRT:       output "onnx::Sigmoid_663" with shape (1, 3, 20, 20, 7) and dtype DataType.FLOAT
TensorRT: building FP32 engine in crowdhuman_yolov5m.engine
export.py:240: DeprecationWarning: Use build_serialized_network instead.
  with builder.build_engine(network, config) as engine, open(f, 'wb') as t:
[03/08/2023-16:23:07] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 15466, GPU 2020 (MiB)
[03/08/2023-16:23:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 15466, GPU 2028 (MiB)
[03/08/2023-16:23:07] [TRT] [W] TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.3.2
[03/08/2023-16:23:07] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/08/2023-16:23:33] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[03/08/2023-16:25:17] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[03/08/2023-16:25:17] [TRT] [I] Total Activation Memory: 4729828352
[03/08/2023-16:25:17] [TRT] [I] Detected 1 inputs and 7 output network tensors.
[03/08/2023-16:25:17] [TRT] [I] Total Host Persistent Memory: 177632
[03/08/2023-16:25:17] [TRT] [I] Total Device Persistent Memory: 2148352
[03/08/2023-16:25:17] [TRT] [I] Total Scratch Memory: 0
[03/08/2023-16:25:17] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 21 MiB, GPU 2464 MiB
[03/08/2023-16:25:17] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 212 steps to complete.
[03/08/2023-16:25:17] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 12.7556ms to assign 9 blocks to 212 nodes requiring 52070912 bytes.
[03/08/2023-16:25:17] [TRT] [I] Total Activation Memory: 52070912
[03/08/2023-16:25:18] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +111, now: CPU 7, GPU 111 (MiB)
TensorRT: export success, saved as crowdhuman_yolov5m.engine (114.7 MB)

Export complete (140.09s)
Results saved to C:\Users\Administrator\Desktop\airun_project\export_engine\yolov5-6.1
Detect:          python detect.py --weights crowdhuman_yolov5m.engine
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'crowdhuman_yolov5m.engine')
Validate:        python val.py --weights crowdhuman_yolov5m.engine
Visualize:       https://netron.app

3. 启动项目

[FastDeploy][INFO]:  Successfully found CUDA ToolKit from system PATH env -> D:\Anaconda3\envs\ai_run\Library\bin
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(644)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx        Detect serialized TensorRT Engine file in OCRV2/ch_PP-OCRv3_det_infer/det_trt_cache.trt, will load it directly.
[WARNING] fastdeploy/backends/tensorrt/utils.cc(40)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] input name: x, shape: [32, 3, 960, 960], The shape range before: min_shape=[1, 3, 64, 64], max_shape=[1, 3, 960, 960].
[WARNING] fastdeploy/backends/tensorrt/utils.cc(52)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] The updated shape range now: min_shape=[1, 3, 64, 64], max_shape=[32, 3, 960, 960].
[WARNING] fastdeploy/backends/tensorrt/utils.cc(40)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] input name: x, shape: [1, 3, 32, 64], The shape range before: min_shape=[1, 3, 64, 64], max_shape=[32, 3, 960, 960].
[WARNING] fastdeploy/backends/tensorrt/utils.cc(52)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] The updated shape range now: min_shape=[1, 3, 32, 64], max_shape=[32, 3, 960, 960].
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(108)::fastdeploy::TrtBackend::LoadTrtCache   Build TensorRT Engine from cache file: OCRV2/ch_PP-OCRv3_det_infer/det_trt_cache.trt with shape range information as below,
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(111)::fastdeploy::TrtBackend::LoadTrtCache   Input name: x, shape=[-1, 3, -1, -1], min=[1, 3, 32, 64], max=[32, 3, 960, 960]

[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(644)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx        Detect serialized TensorRT Engine file in OCRV2/ch_ppocr_mobile_v2.0_cls_infer/cls_trt_cache.trt, will load it directly.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(108)::fastdeploy::TrtBackend::LoadTrtCache   Build TensorRT Engine from cache file: OCRV2/ch_ppocr_mobile_v2.0_cls_infer/cls_trt_cache.trt with shape range information as below,
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(111)::fastdeploy::TrtBackend::LoadTrtCache   Input name: x, shape=[-1, 3, -1, -1], min=[1, 3, 48, 10], max=[32, 3, 48, 1024]

[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(644)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx        Detect serialized TensorRT Engine file in OCRV2/ch_PP-OCRv3_rec_infer/rec_trt_cache.trt, will load it directly.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(108)::fastdeploy::TrtBackend::LoadTrtCache   Build TensorRT Engine from cache file: OCRV2/ch_PP-OCRv3_rec_infer/rec_trt_cache.trt with shape range information as below,
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(111)::fastdeploy::TrtBackend::LoadTrtCache   Input name: x, shape=[-1, 3, 48, -1], min=[1, 3, 48, 10], max=[32, 3, 48, 2304]

[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
[INFO] fastdeploy/vision/common/processors/transform.cc(93)::fastdeploy::vision::FuseNormalizeHWC2CHW   Normalize and HWC2CHW are fused to NormalizeAndPermute  in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::fastdeploy::vision::FuseNormalizeColorConvert     BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(644)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx        Detect serialized TensorRT Engine file in COLORV2/model/trt_cache.trt, will load it directly.
[WARNING] fastdeploy/backends/tensorrt/utils.cc(40)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] input name: x, shape: [1, 3, 224, 224], The shape range before: min_shape=[-1, 3, 224, 224], max_shape=[-1, 3, 224, 224].
[WARNING] fastdeploy/backends/tensorrt/utils.cc(52)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] The updated shape range now: min_shape=[1, 3, 224, 224], max_shape=[1, 3, 224, 224].
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(108)::fastdeploy::TrtBackend::LoadTrtCache   Build TensorRT Engine from cache file: COLORV2/model/trt_cache.trt with shape range information as below,
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(111)::fastdeploy::TrtBackend::LoadTrtCache   Input name: x, shape=[-1, 3, 224, 224], min=[1, 3, 224, 224], max=[1, 3, 224, 224]

[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
Loading crowdhuman_yolov5m.engine for TensorRT inference...
[03/08/2023-16:27:54] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

[03/08/2023-16:27:54] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 13939, GPU 4476 (MiB)
<tensorrt.tensorrt.Runtime object at 0x000001AA90576930>
[03/08/2023-16:27:54] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

[03/08/2023-16:27:54] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 13939, GPU 4476 (MiB)
[03/08/2023-16:27:54] [TRT] [I] Loaded engine size: 109 MiB
[03/08/2023-16:27:54] [TRT] [E] 1: [stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::40] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 213, Serialized Engine Version: 232)
[03/08/2023-16:27:54] [TRT] [E] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
None
Traceback (most recent call last):
  File ".\app\app.py", line 97, in <module>
    yolo_model = DetectMultiBackend('crowdhuman_yolov5m.engine', device=device)  # 👨‍🚀
  File "C:\Users/Administrator/Desktop/airun_project\yolov5\models\common.py", line 390, in __init__
    context = model.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

4.单独使用Fastdeploy trt后端，yolov5 使用.pt模型项目正常启动。

5. 单独使用yolov5 trt模型，Fastdeploy 不使用trt后端，使用其他backend，项目正常启动

jiangjiajun commented 1 year ago

这是有点奇怪。试下删除site-packages/fastdelpoy/libs/third_libs/tensorrt目录，重新再跑下，避免存在两份tensorrt库

LiQiang0307 commented 1 year ago

这是有点奇怪。试下删除site-packages/fastdelpoy/libs/third_libs/tensorrt目录，重新再跑下，避免存在两份tensorrt库

此方法可以解决，应该是两份tensorrt库发生了冲突。

1. 删除site-packages/fastdelpoy/libs/third_libs/tensorrt目录

2. 重新生成Fastdeploy TRT模型

[FastDeploy][INFO]:  Successfully found CUDA ToolKit from system PATH env -> D:\Anaconda3\envs\ai_run\Library\bin
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(500)::fastdeploy::TrtBackend::BuildTrtEngine Start to building TensorRT Engine...
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(586)::fastdeploy::TrtBackend::BuildTrtEngine TensorRT Engine is built successfully.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(588)::fastdeploy::TrtBackend::BuildTrtEngine Serialize TensorRTEngine to local file OCRV2/ch_PP-OCRv3_det_infer/det_trt_cache.trt.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(598)::fastdeploy::TrtBackend::BuildTrtEngine TensorRTEngine is serialized to local file OCRV2/ch_PP-OCRv3_det_infer/det_trt_cache.trt, we can load this model from the seralized engine directly next time.
[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(500)::fastdeploy::TrtBackend::BuildTrtEngine Start to building TensorRT Engine...
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(586)::fastdeploy::TrtBackend::BuildTrtEngine TensorRT Engine is built successfully.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(588)::fastdeploy::TrtBackend::BuildTrtEngine Serialize TensorRTEngine to local file OCRV2/ch_ppocr_mobile_v2.0_cls_infer/cls_trt_cache.trt.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(598)::fastdeploy::TrtBackend::BuildTrtEngine TensorRTEngine is serialized to local file OCRV2/ch_ppocr_mobile_v2.0_cls_infer/cls_trt_cache.trt, we can load this model from the seralized engine directly next time.
[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(500)::fastdeploy::TrtBackend::BuildTrtEngine Start to building TensorRT Engine...
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(586)::fastdeploy::TrtBackend::BuildTrtEngine TensorRT Engine is built successfully.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(588)::fastdeploy::TrtBackend::BuildTrtEngine Serialize TensorRTEngine to local file OCRV2/ch_PP-OCRv3_rec_infer/rec_trt_cache.trt.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(598)::fastdeploy::TrtBackend::BuildTrtEngine TensorRTEngine is serialized to local file OCRV2/ch_PP-OCRv3_rec_infer/rec_trt_cache.trt, we can load this model from the seralized engine directly next time.
[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
[WARNING] fastdeploy/backends/tensorrt/utils.cc(40)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] input name: x, shape: [32, 3, 160, 192], The shape range before: min_shape=[1, 3, 64, 64], max_shape=[1, 3, 960, 960].
[WARNING] fastdeploy/backends/tensorrt/utils.cc(52)::fastdeploy::ShapeRangeInfo::Update [New Shape Out of Range] The updated shape range now: min_shape=[1, 3, 64, 64], max_shape=[32, 3, 960, 960].
[WARNING] fastdeploy/backends/tensorrt/trt_backend.cc(296)::fastdeploy::TrtBackend::Infer       TensorRT engine will be rebuilt once shape range information changed, this may take lots of time, you can set a proper shape range before loading model to avoid rebuilding process. refer https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/en/faq/tensorrt_tricks.md for more details.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(500)::fastdeploy::TrtBackend::BuildTrtEngine Start to building TensorRT Engine...
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(586)::fastdeploy::TrtBackend::BuildTrtEngine TensorRT Engine is built successfully.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(588)::fastdeploy::TrtBackend::BuildTrtEngine Serialize TensorRTEngine to local file OCRV2/ch_PP-OCRv3_det_infer/det_trt_cache.trt.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(598)::fastdeploy::TrtBackend::BuildTrtEngine TensorRTEngine is serialized to local file OCRV2/ch_PP-OCRv3_det_infer/det_trt_cache.trt, we can load this model from the seralized engine directly next time.
[INFO] fastdeploy/vision/common/processors/transform.cc(93)::fastdeploy::vision::FuseNormalizeHWC2CHW   Normalize and HWC2CHW are fused to NormalizeAndPermute  in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::fastdeploy::vision::FuseNormalizeColorConvert     BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[WARNING] fastdeploy/backends/tensorrt/trt_backend.cc(656)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx     Cannot build engine right now, because there's dynamic input shape exists, list as below,
[WARNING] fastdeploy/backends/tensorrt/trt_backend.cc(660)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx     Input 0: TensorInfo(name: x, shape: [-1, 3, 224, 224], dtype: FDDataType::FP32)
[WARNING] fastdeploy/backends/tensorrt/trt_backend.cc(662)::fastdeploy::TrtBackend::CreateTrtEngineFromOnnx     FastDeploy will build the engine while inference with input data, and will also collect the input shape range information. You should be noticed that FastDeploy will rebuild the engine while new input shape is out of the collected shape range, this may bring some time consuming problem, refer https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/en/faq/tensorrt_tricks.md for more details.
[INFO] fastdeploy/runtime.cc(506)::fastdeploy::Runtime::Init    Runtime initialized with Backend::TRT in Device::GPU.
Loading crowdhuman_yolov5m.engine for TensorRT inference...
[03/08/2023-17:36:00] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

<tensorrt.tensorrt.Runtime object at 0x0000018D58B12A30>
[03/08/2023-17:36:00] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

[03/08/2023-17:36:00] [TRT] [I] Loaded engine size: 109 MiB
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +111, now: CPU 0, GPU 3331 (MiB)
<tensorrt.tensorrt.ICudaEngine object at 0x0000018D58B289F0>
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +52, now: CPU 1, GPU 3383 (MiB)
[03/08/2023-17:36:00] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Loading osnet_x0_5_msmt17.engine for TensorRT inference...
[03/08/2023-17:36:00] [TRT] [I] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

[03/08/2023-17:36:00] [TRT] [I] Loaded engine size: 3 MiB
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 16233, GPU 4912 (MiB)
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 16233, GPU 4920 (MiB)
[03/08/2023-17:36:00] [TRT] [W] TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.3.2
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2, now: CPU 1, GPU 3385 (MiB)
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 16233, GPU 4912 (MiB)
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 16233, GPU 4920 (MiB)
[03/08/2023-17:36:00] [TRT] [W] TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.3.2
[03/08/2023-17:36:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +326, now: CPU 1, GPU 3711 (MiB)

3.项目可以正常启动运行，TRT加速成功

LiQiang0307 commented 1 year ago

@jiangjiajun 👍👍👍非常感谢您的回复与帮助！☕☕☕

PaddlePaddle / FastDeploy