NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.55k stars 2.1k forks source link

TensorRT 7 segmentation fault when deserializing "PriorBox" plugin #562

Closed jkjung-avt closed 4 years ago

jkjung-avt commented 4 years ago

Description

Use "trtexec" to save a TensorRT engine from the original Caffe Single-Shot Multibox Detector (SSD_300x300) model. Then use "trtexec" to load the engine. "trtexec" crashes with segmentation fault. Backtrace analysis in gdb shows the crash is caused by deserialization of the "PriorBox" plugin.

This worked in TensorRT 6. The problem is only reproduced in TensorRT 7.

Environment

TensorRT Version: 7.1.0 [Developer Preview] GPU Type: Jetson Nano Nvidia Driver Version: JetPack-4.4 DP (L4T R32.4.2) CUDA Version: 10.2 CUDNN Version: 8.0.0 [Develop Preview] Operating System + Version: Ubuntu 18.04, Linux kernel 4.9.140 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

The original Caffe SSD_300x300 model (models_VGGNet_coco_SSD_300x300.tar.gz) could be downloaded from here. This is the COCO SSD300 in the original (weiliu89) SSD Caffe repository.

Steps To Reproduce

  1. Download and decompressed the SSD_300x300 model from the link above. In the "models/VGGNet/coco/SSD_300x300/" directory, you'd find these 2 files: "deploy.prototxt" and "VGG_coco_SSD_300x300_iter_400000.caffemodel".
  2. Replace all "Flatten" layers in "deploy.prototxt" with "Reshape" layers with the following parameters.
      reshape_param {
        shape {
          dim: 0
          dim: -1
          dim: 1
          dim: 1
        }
      }
  3. In the final "DetectionOutput" layer in "deploy.prototxt", add one more output, "keep_count".
    layer {
      name: "detection_out"
      type: "DetectionOutput"
      ......
      top: "detection_out"
    + top: "keep_count"
      ......
  4. Use "trtexec" to generate the TensorRT engine.
    $ cd SSD_300x300
    $ trtexec --deploy=deploy.prototxt \
                   --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \
                   --output=detection_out \
                   --workspace=256 \
                   --fp16 \
                   --saveEngine=deploy.engine \
                   --dumpProfile
  5. Use "trtexec" to load the TensorRT engine. It would crash when trying to deserialize the engine.
    $ trtexec --deploy=deploy.prototxt \
                   --model=VGG_coco_SSD_300x300_iter_400000.caffemodel \
                   --output=detection_out \
                   --workspace=256 \
                   --fp16 \
                   --loadEngine=deploy.engine \
                   --dumpProfile

    Results:

    [05/19/2020-17:40:53] [V] [TRT] Deserialize required 5471870 microseconds.
    Segmentation fault (core dumped)
  6. Use gdb to analyze the core dump.
    $ gdb trtexec core
    GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
    Copyright (C) 2018 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "aarch64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from trtexec...done.
    [New LWP 17555]
    [New LWP 17566]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
    Core was generated by `./trtexec --deploy=SSD_300x300/deploy.prototxt --model=SSD_300x300/VGG_coco_SSD'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
    [Current thread is 1 (Thread 0x7fb069a910 (LWP 17555))]
    (gdb) bt
    #0  0x0000007fa29fd690 in nvinfer1::plugin::PriorBox::PriorBox(nvinfer1::plugin::PriorBoxParameters, int, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
    #1  0x0000007fa29fdc10 in nvinfer1::plugin::PriorBox::clone() const () from /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
    #2  0x0000007fa37e6138 in nvinfer1::rt::SafeExecutionContext::SafeExecutionContext(nvinfer1::rt::SafeEngine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
    #3  0x0000007fa3574fac in nvinfer1::rt::ExecutionContext::ExecutionContext(nvinfer1::rt::Engine const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
    #4  0x0000007fa35758d8 in nvinfer1::rt::Engine::createExecutionContext() () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.7
    #5  0x00000055635e28d4 in sample::setUpInference (iEnv=..., inference=...) at ../common/sampleInference.cpp:44
    #6  0x00000055635dbff8 in main ()
    (gdb)
jkjung-avt commented 4 years ago

The exact "deploy.prototxt" after modifications (steps 2 & 3 above) is as follows:

deploy.prototxt.txt

jkjung-avt commented 4 years ago

I've just done the same testing on a x86_64 PC with the following configuration. And trtexec could deserialize the engine file without problem. So this appears to be a Jetson (JetPack-4.4 DP) specific problem...

TensorRT Version: 7.0.0 GPU Type: GeForce GTX-2080Ti and GeForce GTX-1080 Nvidia Driver Version: 440.82 CUDA Version: 10.2 CUDNN Version: 7.6.5 Operating System + Version: Ubuntu 18.04, Linux kernel 4.15.0

rmccorm4 commented 4 years ago

I've just done the same testing on a x86_64 PC with the following configuration. And trtexec could deserialize the engine file without problem. So this appears to be a Jetson (JetPack-4.4 DP) specific problem...

@jkjung-avt for Jetson specific problems, I recommend reaching out on the Jetson developer forums, they're very helpful there: https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/70

CC @dusty-nv

jkjung-avt commented 4 years ago

@rmccorm4, I've post the issue onto Jetson Nano Developer Forum now: https://forums.developer.nvidia.com/t/tensorrt-7-1-0-dp-segfault-when-deserailizing-the-priorbox-plugin/124111. I'll track the issue there instead. Thanks for your quick reply.

coolbei commented 3 years ago

the problem with p100 gpu.