[E] [TRT] ../rtSafe/cuda/cudaConvolutionRunner.cpp (457) - Cudnn Error in execute: 3 (CUDNN_STATUS_BAD_PARAM)

my env: Ubuntu 18.04 cuda 10.2 cudnn 7.6.5 tensorrt 7.1.3 tensorrt oss libnvinfer_plugin.so.7.1.3

cudnn 7.6.5 installed from tar archive, CUDNN_INSTALL_DIR=/user/local/cuda onnx created with https://github.com/Tianxiaomo/pytorch-YOLOv4 in SampleYolo.cpp in bool SampleYolo::build() add this code:

        profile->setDimensions("input", OptProfileSelector::kMIN, Dims4{1, 3, 320, 320});
        profile->setDimensions("input", OptProfileSelector::kOPT, Dims4{1, 3, 320, 320});
        profile->setDimensions("input", OptProfileSelector::kMAX, Dims4{1, 3, 320, 320});
        config->addOptimizationProfile(profile);

in onnx_add_nms_plugin.py changed mns_node to "BatchedNMSDynamic_TRT"

$ make
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: SampleYolo.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: main.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: ../common/sampleInference.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: ../common/sampleOptions.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: ../common/logger.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: ../common/getOptions.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: ../common/sampleReporting.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/dchobj/../common; fi; :
Compiling: ../common/sampleEngines.cpp
Linking: ../bin/yolov4_debug
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/chobj/../common; fi; :
Compiling: SampleYolo.cpp
if [ ! -d ../bin/chobj/../common ]; then mkdir -p ../bin/chobj/../common; fi; :
Compiling: main.cpp
Linking: ../bin/yolov4
# Copy every EXTRA_FILE of this sample to bin dir

$ ../bin/yolov4 -demo
&&&& RUNNING TensorRT.sample_yolo # ../bin/yolov4 -demo
There are 0 coco images to process
[07/18/2021-13:20:19] [I] Building and running a GPU inference engine for Yolo
[07/18/2021-13:20:19] [I] Parsing ONNX file: ../data/yolov4.onnx
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[07/18/2021-13:20:20] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[07/18/2021-13:20:20] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[07/18/2021-13:20:20] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[07/18/2021-13:20:20] [I] [TRT] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[07/18/2021-13:20:20] [W] [TRT] Output type must be INT32 for shape outputs
[07/18/2021-13:20:20] [W] [TRT] Output type must be INT32 for shape outputs
[07/18/2021-13:20:20] [I] Building TensorRT engine../data/yolov4.engine
[07/18/2021-13:21:00] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[07/18/2021-13:22:05] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[07/18/2021-13:22:07] [I] TRT Engine file saved to: ../data/yolov4.engine
4
[07/18/2021-13:22:07] [I] Loading or building yolo model done
[07/18/2021-13:22:07] [E] [TRT] ../rtSafe/cuda/cudaConvolutionRunner.cpp (457) - Cudnn Error in execute: 3 (CUDNN_STATUS_BAD_PARAM)
[07/18/2021-13:22:07] [E] [TRT] FAILED_EXECUTION: std::exception
Time consumed in preProcess: 0
Time consumed in model: 0
Time consumed in postProcess: 0
[07/18/2021-13:22:07] [I] Inference of yolo model done

NVIDIA-AI-IOT / yolo_deepstream

[E] [TRT] ../rtSafe/cuda/cudaConvolutionRunner.cpp (457) - Cudnn Error in execute: 3 (CUDNN_STATUS_BAD_PARAM) #15