ONNX to Tensor Error Code 3: Miscellaneous (Weights [name=node_of_649.weight] has value 7.11056e+31 outside of FP16 range.

I converted Darknets weights to ONNX (since from PyTorch I have issue https://github.com/Tianxiaomo/pytorch-YOLOv4/issues/495 ) and now I am trying to convert ONNX to TensorRT. The procedure throws an error.

How to reproduce Start deepstream 6 container with TensorRT 8 already preinstalled:

docker run \
  -it \
  --rm \
  --net=host \
  --runtime nvidia \
  -e DISPLAY=$DISPLAY \
  -v /tmp/.X11-unix/:/tmp/.X11-unix \
  -v /home/ubuntu/pycharm/projects/fususcore-ai-detector-nvidia/etc/local:/shared \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --device /dev/video0 \
  --privileged \
  --expose 8554 \
  nvcr.io/nvidia/deepstream:6.0-triton bash

Run the following commands:

# INCOMPLETE SCRIPT

cd /src
git clone https://github.com/Tianxiaomo/pytorch-YOLOv4.git
cd pytorch-YOLOv4
apt -y install python3-venv
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip3 install \
numpy==1.18.2 \
torch==1.4.0 \
tensorboardX==2.0 \
scikit_image==0.16.2 \
matplotlib==2.2.3 \
tqdm==4.43.0 \
easydict==1.9 \
Pillow==7.1.2 \
opencv_python \
onnx \
onnxruntime

# PyTorch to ONNX (not working)
# wget --no-check-certificate "https://docs.google.com/uc?export=download&id=1wv_LiFeCRYwtpkqREPeI13-gPELBDwuJ" -r -A 'uc*' -e robots=off -nd -O yolov4.pth
# python3 demo_pytorch2onnx.py yolov4.pth data/dog.jpg 8 80 416 416

# Darknet to ONNX
wget --no-check-certificate "https://drive.google.com/u/0/uc?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT&export=download" -r -A 'uc*' -e robots=off -nd -O yolov4.weights
python demo_darknet2onnx.py cfg/yolov4.cfg data/coco.names yolov4.weights data/dog.jpg  -1

# ONNX to TensorRT
/usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx \
--minShapes=input:1x3x608x608 --optShapes=input:4x3x608x608 --maxShapes=input:8x3x608x608 \
--workspace=8000 --saveEngine=yolov4_-1_3_608_608_dynamic.engine --fp16

Error:

/usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --minShapes=input:1x3x608x608 --optShapes=input:4x3x608x608 --maxShapes=input:8x3x608x608 --workspace=8000 --saveEngine=yolov4_-1_3_608_608_dynamic.engine --fp16
[12/12/2021-23:15:12] [I] === Model Options ===
[12/12/2021-23:15:12] [I] Format: ONNX
[12/12/2021-23:15:12] [I] Model: yolov4_-1_3_608_608_dynamic.onnx
[12/12/2021-23:15:12] [I] Output:
[12/12/2021-23:15:12] [I] === Build Options ===
[12/12/2021-23:15:12] [I] Max batch: explicit
[12/12/2021-23:15:12] [I] Workspace: 8000 MiB
[12/12/2021-23:15:12] [I] minTiming: 1
[12/12/2021-23:15:12] [I] avgTiming: 8
[12/12/2021-23:15:12] [I] Precision: FP32+FP16
[12/12/2021-23:15:12] [I] Calibration: 
[12/12/2021-23:15:12] [I] Refit: Disabled
[12/12/2021-23:15:12] [I] Sparsity: Disabled
[12/12/2021-23:15:12] [I] Safe mode: Disabled
[12/12/2021-23:15:12] [I] Restricted mode: Disabled
[12/12/2021-23:15:12] [I] Save engine: yolov4_-1_3_608_608_dynamic.engine
[12/12/2021-23:15:12] [I] Load engine: 
[12/12/2021-23:15:12] [I] NVTX verbosity: 0
[12/12/2021-23:15:12] [I] Tactic sources: Using default tactic sources
[12/12/2021-23:15:12] [I] timingCacheMode: local
[12/12/2021-23:15:12] [I] timingCacheFile: 
[12/12/2021-23:15:12] [I] Input(s)s format: fp32:CHW
[12/12/2021-23:15:12] [I] Output(s)s format: fp32:CHW
[12/12/2021-23:15:12] [I] Input build shape: input=1x3x608x608+4x3x608x608+8x3x608x608
[12/12/2021-23:15:12] [I] Input calibration shapes: model
[12/12/2021-23:15:12] [I] === System Options ===
[12/12/2021-23:15:12] [I] Device: 0
[12/12/2021-23:15:12] [I] DLACore: 
[12/12/2021-23:15:12] [I] Plugins:
[12/12/2021-23:15:12] [I] === Inference Options ===
[12/12/2021-23:15:12] [I] Batch: Explicit
[12/12/2021-23:15:12] [I] Input inference shape: input=4x3x608x608
[12/12/2021-23:15:12] [I] Iterations: 10
[12/12/2021-23:15:12] [I] Duration: 3s (+ 200ms warm up)
[12/12/2021-23:15:12] [I] Sleep time: 0ms
[12/12/2021-23:15:12] [I] Streams: 1
[12/12/2021-23:15:12] [I] ExposeDMA: Disabled
[12/12/2021-23:15:12] [I] Data transfers: Enabled
[12/12/2021-23:15:12] [I] Spin-wait: Disabled
[12/12/2021-23:15:12] [I] Multithreading: Disabled
[12/12/2021-23:15:12] [I] CUDA Graph: Disabled
[12/12/2021-23:15:12] [I] Separate profiling: Disabled
[12/12/2021-23:15:12] [I] Time Deserialize: Disabled
[12/12/2021-23:15:12] [I] Time Refit: Disabled
[12/12/2021-23:15:12] [I] Skip inference: Disabled
[12/12/2021-23:15:12] [I] Inputs:
[12/12/2021-23:15:12] [I] === Reporting Options ===
[12/12/2021-23:15:12] [I] Verbose: Disabled
[12/12/2021-23:15:12] [I] Averages: 10 inferences
[12/12/2021-23:15:12] [I] Percentile: 99
[12/12/2021-23:15:12] [I] Dump refittable layers:Disabled
[12/12/2021-23:15:12] [I] Dump output: Disabled
[12/12/2021-23:15:12] [I] Profile: Disabled
[12/12/2021-23:15:12] [I] Export timing to JSON file: 
[12/12/2021-23:15:12] [I] Export output to JSON file: 
[12/12/2021-23:15:12] [I] Export profile to JSON file: 
[12/12/2021-23:15:12] [I] 
[12/12/2021-23:15:12] [I] === Device Information ===
[12/12/2021-23:15:12] [I] Selected Device: Tesla T4
[12/12/2021-23:15:12] [I] Compute Capability: 7.5
[12/12/2021-23:15:12] [I] SMs: 40
[12/12/2021-23:15:12] [I] Compute Clock Rate: 1.59 GHz
[12/12/2021-23:15:12] [I] Device Global Memory: 15109 MiB
[12/12/2021-23:15:12] [I] Shared Memory per SM: 64 KiB
[12/12/2021-23:15:12] [I] Memory Bus Width: 256 bits (ECC enabled)
[12/12/2021-23:15:12] [I] Memory Clock Rate: 5.001 GHz
[12/12/2021-23:15:12] [I] 
[12/12/2021-23:15:12] [I] TensorRT version: 8001
[12/12/2021-23:15:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 335, GPU 250 (MiB)
[12/12/2021-23:15:12] [I] Start parsing network model
[12/12/2021-23:15:12] [I] [TRT] ----------------------------------------------------------------
[12/12/2021-23:15:12] [I] [TRT] Input filename:   yolov4_-1_3_608_608_dynamic.onnx
[12/12/2021-23:15:12] [I] [TRT] ONNX IR version:  0.0.4
[12/12/2021-23:15:12] [I] [TRT] Opset version:    11
[12/12/2021-23:15:12] [I] [TRT] Producer name:    pytorch
[12/12/2021-23:15:12] [I] [TRT] Producer version: 1.3
[12/12/2021-23:15:12] [I] [TRT] Domain:           
[12/12/2021-23:15:12] [I] [TRT] Model version:    0
[12/12/2021-23:15:12] [I] [TRT] Doc string:       
[12/12/2021-23:15:12] [I] [TRT] ----------------------------------------------------------------
[12/12/2021-23:15:13] [W] [TRT] onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/12/2021-23:15:13] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[12/12/2021-23:15:13] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[12/12/2021-23:15:14] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[12/12/2021-23:15:14] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[12/12/2021-23:15:16] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[12/12/2021-23:15:16] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[12/12/2021-23:15:17] [I] Finish parsing network model
[12/12/2021-23:15:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 589, GPU 250 (MiB)
[12/12/2021-23:15:17] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 589 MiB, GPU 250 MiB
[12/12/2021-23:15:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +491, GPU +212, now: CPU 1325, GPU 462 (MiB)
[12/12/2021-23:15:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +287, GPU +198, now: CPU 1612, GPU 660 (MiB)
[12/12/2021-23:15:20] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[12/12/2021-23:15:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2012, GPU 802 (MiB)
[12/12/2021-23:15:22] [E] Error[3]: [weightConvertors.cpp::operator()::562] Error Code 3: Miscellaneous (Weights [name=node_of_649.weight] has value 7.11056e+31 outside of FP16 range. A possible fix is to retrain the model with regularization to reduce the magnitude of the weights, or if the intent is to express +infinity, use +infinity instead.)
[12/12/2021-23:15:22] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)

Tianxiaomo / pytorch-YOLOv4

ONNX to Tensor Error Code 3: Miscellaneous (Weights [name=node_of_649.weight] has value 7.11056e+31 outside of FP16 range. #496