Closed mfoglio closed 2 years ago
I converted Darknets weights to ONNX (since from PyTorch I have issue https://github.com/Tianxiaomo/pytorch-YOLOv4/issues/495 ) and now I am trying to convert ONNX to TensorRT. The procedure throws an error.
How to reproduce Start deepstream 6 container with TensorRT 8 already preinstalled:
docker run \ -it \ --rm \ --net=host \ --runtime nvidia \ -e DISPLAY=$DISPLAY \ -v /tmp/.X11-unix/:/tmp/.X11-unix \ -v /home/ubuntu/pycharm/projects/fususcore-ai-detector-nvidia/etc/local:/shared \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --device /dev/video0 \ --privileged \ --expose 8554 \ nvcr.io/nvidia/deepstream:6.0-triton bash
Run the following commands:
# INCOMPLETE SCRIPT cd /src git clone https://github.com/Tianxiaomo/pytorch-YOLOv4.git cd pytorch-YOLOv4 apt -y install python3-venv python3 -m venv venv source venv/bin/activate pip install --upgrade pip pip3 install \ numpy==1.18.2 \ torch==1.4.0 \ tensorboardX==2.0 \ scikit_image==0.16.2 \ matplotlib==2.2.3 \ tqdm==4.43.0 \ easydict==1.9 \ Pillow==7.1.2 \ opencv_python \ onnx \ onnxruntime # PyTorch to ONNX (not working) # wget --no-check-certificate "https://docs.google.com/uc?export=download&id=1wv_LiFeCRYwtpkqREPeI13-gPELBDwuJ" -r -A 'uc*' -e robots=off -nd -O yolov4.pth # python3 demo_pytorch2onnx.py yolov4.pth data/dog.jpg 8 80 416 416 # Darknet to ONNX wget --no-check-certificate "https://drive.google.com/u/0/uc?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT&export=download" -r -A 'uc*' -e robots=off -nd -O yolov4.weights python demo_darknet2onnx.py cfg/yolov4.cfg data/coco.names yolov4.weights data/dog.jpg -1 # ONNX to TensorRT /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx \ --minShapes=input:1x3x608x608 --optShapes=input:4x3x608x608 --maxShapes=input:8x3x608x608 \ --workspace=8000 --saveEngine=yolov4_-1_3_608_608_dynamic.engine --fp16
Error:
/usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --minShapes=input:1x3x608x608 --optShapes=input:4x3x608x608 --maxShapes=input:8x3x608x608 --workspace=8000 --saveEngine=yolov4_-1_3_608_608_dynamic.engine --fp16 [12/12/2021-23:15:12] [I] === Model Options === [12/12/2021-23:15:12] [I] Format: ONNX [12/12/2021-23:15:12] [I] Model: yolov4_-1_3_608_608_dynamic.onnx [12/12/2021-23:15:12] [I] Output: [12/12/2021-23:15:12] [I] === Build Options === [12/12/2021-23:15:12] [I] Max batch: explicit [12/12/2021-23:15:12] [I] Workspace: 8000 MiB [12/12/2021-23:15:12] [I] minTiming: 1 [12/12/2021-23:15:12] [I] avgTiming: 8 [12/12/2021-23:15:12] [I] Precision: FP32+FP16 [12/12/2021-23:15:12] [I] Calibration: [12/12/2021-23:15:12] [I] Refit: Disabled [12/12/2021-23:15:12] [I] Sparsity: Disabled [12/12/2021-23:15:12] [I] Safe mode: Disabled [12/12/2021-23:15:12] [I] Restricted mode: Disabled [12/12/2021-23:15:12] [I] Save engine: yolov4_-1_3_608_608_dynamic.engine [12/12/2021-23:15:12] [I] Load engine: [12/12/2021-23:15:12] [I] NVTX verbosity: 0 [12/12/2021-23:15:12] [I] Tactic sources: Using default tactic sources [12/12/2021-23:15:12] [I] timingCacheMode: local [12/12/2021-23:15:12] [I] timingCacheFile: [12/12/2021-23:15:12] [I] Input(s)s format: fp32:CHW [12/12/2021-23:15:12] [I] Output(s)s format: fp32:CHW [12/12/2021-23:15:12] [I] Input build shape: input=1x3x608x608+4x3x608x608+8x3x608x608 [12/12/2021-23:15:12] [I] Input calibration shapes: model [12/12/2021-23:15:12] [I] === System Options === [12/12/2021-23:15:12] [I] Device: 0 [12/12/2021-23:15:12] [I] DLACore: [12/12/2021-23:15:12] [I] Plugins: [12/12/2021-23:15:12] [I] === Inference Options === [12/12/2021-23:15:12] [I] Batch: Explicit [12/12/2021-23:15:12] [I] Input inference shape: input=4x3x608x608 [12/12/2021-23:15:12] [I] Iterations: 10 [12/12/2021-23:15:12] [I] Duration: 3s (+ 200ms warm up) [12/12/2021-23:15:12] [I] Sleep time: 0ms [12/12/2021-23:15:12] [I] Streams: 1 [12/12/2021-23:15:12] [I] ExposeDMA: Disabled [12/12/2021-23:15:12] [I] Data transfers: Enabled [12/12/2021-23:15:12] [I] Spin-wait: Disabled [12/12/2021-23:15:12] [I] Multithreading: Disabled [12/12/2021-23:15:12] [I] CUDA Graph: Disabled [12/12/2021-23:15:12] [I] Separate profiling: Disabled [12/12/2021-23:15:12] [I] Time Deserialize: Disabled [12/12/2021-23:15:12] [I] Time Refit: Disabled [12/12/2021-23:15:12] [I] Skip inference: Disabled [12/12/2021-23:15:12] [I] Inputs: [12/12/2021-23:15:12] [I] === Reporting Options === [12/12/2021-23:15:12] [I] Verbose: Disabled [12/12/2021-23:15:12] [I] Averages: 10 inferences [12/12/2021-23:15:12] [I] Percentile: 99 [12/12/2021-23:15:12] [I] Dump refittable layers:Disabled [12/12/2021-23:15:12] [I] Dump output: Disabled [12/12/2021-23:15:12] [I] Profile: Disabled [12/12/2021-23:15:12] [I] Export timing to JSON file: [12/12/2021-23:15:12] [I] Export output to JSON file: [12/12/2021-23:15:12] [I] Export profile to JSON file: [12/12/2021-23:15:12] [I] [12/12/2021-23:15:12] [I] === Device Information === [12/12/2021-23:15:12] [I] Selected Device: Tesla T4 [12/12/2021-23:15:12] [I] Compute Capability: 7.5 [12/12/2021-23:15:12] [I] SMs: 40 [12/12/2021-23:15:12] [I] Compute Clock Rate: 1.59 GHz [12/12/2021-23:15:12] [I] Device Global Memory: 15109 MiB [12/12/2021-23:15:12] [I] Shared Memory per SM: 64 KiB [12/12/2021-23:15:12] [I] Memory Bus Width: 256 bits (ECC enabled) [12/12/2021-23:15:12] [I] Memory Clock Rate: 5.001 GHz [12/12/2021-23:15:12] [I] [12/12/2021-23:15:12] [I] TensorRT version: 8001 [12/12/2021-23:15:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 335, GPU 250 (MiB) [12/12/2021-23:15:12] [I] Start parsing network model [12/12/2021-23:15:12] [I] [TRT] ---------------------------------------------------------------- [12/12/2021-23:15:12] [I] [TRT] Input filename: yolov4_-1_3_608_608_dynamic.onnx [12/12/2021-23:15:12] [I] [TRT] ONNX IR version: 0.0.4 [12/12/2021-23:15:12] [I] [TRT] Opset version: 11 [12/12/2021-23:15:12] [I] [TRT] Producer name: pytorch [12/12/2021-23:15:12] [I] [TRT] Producer version: 1.3 [12/12/2021-23:15:12] [I] [TRT] Domain: [12/12/2021-23:15:12] [I] [TRT] Model version: 0 [12/12/2021-23:15:12] [I] [TRT] Doc string: [12/12/2021-23:15:12] [I] [TRT] ---------------------------------------------------------------- [12/12/2021-23:15:13] [W] [TRT] onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [12/12/2021-23:15:13] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [12/12/2021-23:15:13] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [12/12/2021-23:15:14] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [12/12/2021-23:15:14] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [12/12/2021-23:15:16] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [12/12/2021-23:15:16] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [12/12/2021-23:15:17] [I] Finish parsing network model [12/12/2021-23:15:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 589, GPU 250 (MiB) [12/12/2021-23:15:17] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 589 MiB, GPU 250 MiB [12/12/2021-23:15:19] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +491, GPU +212, now: CPU 1325, GPU 462 (MiB) [12/12/2021-23:15:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +287, GPU +198, now: CPU 1612, GPU 660 (MiB) [12/12/2021-23:15:20] [W] [TRT] Detected invalid timing cache, setup a local cache instead [12/12/2021-23:15:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2012, GPU 802 (MiB) [12/12/2021-23:15:22] [E] Error[3]: [weightConvertors.cpp::operator()::562] Error Code 3: Miscellaneous (Weights [name=node_of_649.weight] has value 7.11056e+31 outside of FP16 range. A possible fix is to retrain the model with regularization to reduce the magnitude of the weights, or if the intent is to express +infinity, use +infinity instead.) [12/12/2021-23:15:22] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.) Segmentation fault (core dumped)
I solved the conversion from PyTorch to ONNX and I have been able to obtain a TensorRT model from the ONNX.
I converted Darknets weights to ONNX (since from PyTorch I have issue https://github.com/Tianxiaomo/pytorch-YOLOv4/issues/495 ) and now I am trying to convert ONNX to TensorRT. The procedure throws an error.
How to reproduce Start deepstream 6 container with TensorRT 8 already preinstalled:
Run the following commands:
Error: