NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.52k stars 2.1k forks source link

np.vectorize() performance #2469

Closed vvabi-sabi closed 1 year ago

vvabi-sabi commented 1 year ago

Description

YOLOv3 postprocessing. np.vectorize() reduces image processing performance

Environment

TensorRT Version: 8.0.1-1 NVIDIA GPU: GV10B (Jetson Xavier NX) NVIDIA Driver Version: 32.6.1 CUDA Version: cuda10.2 CUDNN Version: 8 Operating System: Ubuntu 18.04.6 LTS Python Version (if applicable): 3.6.9 Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

NVIDIA/TensorRT/samples/python/yolov3_onnx/data_processing.py

224    def sigmoid(value):
225        """Return the sigmoid of the input."""
226        return 1.0 / (1.0 + math.exp(-value))
227
228    def exponential(value):
229        """Return the exponential of the input."""
230        return math.exp(value)
231
232    # Vectorized calculation of above two functions:
233    sigmoid_v = np.vectorize(sigmoid)
234    exponential_v = np.vectorize(exponential)

it looks weird but "np.vectorize" reduces inference performance

Steps To Reproduce

replace "np.vectorize" functions with:

224    def sigmoid(value):
225        """Return the sigmoid of the input."""
226        return 1.0 / (1.0 + np.exp(-value))
227
228    def exponential(value):
229        """Return the exponential of the input."""
230        return np.exp(value)
231
232    # Vectorized calculation of above two functions:
233    sigmoid_v = sigmoid #np.vectorize(sigmoid)
234    exponential_v = exponential #np.vectorize(exponential)

this will add 1 to 5 fps in image processing

zerollzeng commented 1 year ago

It's a numpy issue and also this looks like running in the CPU. the sample is demonstrate the GPU inference of TensorRT.

vvabi-sabi commented 1 year ago

That's right, inference takes place on the gpu. Nevertheless, the selection of objects in the image (bbox) occurs using the "numpy" and "opencv". If the network finds a lot of objects in the image, the postprocessing takes longer.

nvpohanh commented 1 year ago

Looks like a numpy perf issue, not a TRT perf issue.

ttyio commented 1 year ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!