On which GPU did you test centerpoint TensorRT engine

serser commented 3 years ago

Hi @CarkusL , I am using Tensor 7.2.3.4 on V100. I find the latency is almost twice slower than reported. Could you share us your specific environment settings?

&&&& RUNNING TensorRT.sample_onnx_centerpoint # ./centerpoint
[09/15/2021-10:37:46] [I] Building and running a GPU inference engine for CenterPoint
----------------------------------------------------------------
Input filename:   ../data/centerpoint/pointpillars_trt.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.7
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[09/15/2021-10:37:47] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/15/2021-10:37:47] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: ScatterND. Attempting to import as plugin.
[09/15/2021-10:37:47] [I] [TRT] builtin_op_importers.cpp:3771: Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[09/15/2021-10:37:47] [I] [TRT] builtin_op_importers.cpp:3788: Successfully created plugin: ScatterND
[09/15/2021-10:37:47] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/15/2021-10:37:47] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/15/2021-10:37:47] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/15/2021-10:37:47] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/15/2021-10:37:47] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/15/2021-10:37:47] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/15/2021-10:37:48] [W] [TRT] TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.3
[09/15/2021-10:37:53] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[09/15/2021-10:38:05] [I] [TRT] Detected 2 inputs and 42 output network tensors.
[09/15/2021-10:38:05] [W] [TRT] TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.3
[09/15/2021-10:38:05] [I] getNbInputs: 2 

[09/15/2021-10:38:05] [I] getNbOutputs: 42 

[09/15/2021-10:38:05] [I] getNbOutputs Name: 594 

[09/15/2021-10:38:05] [W] [TRT] TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.3
filePath[idx]: ../data/centerpoint//points/0a0d6b8c2e884134a3b48df43d54c36a.bin
[09/15/2021-10:38:05] [I] [INFO] pointNum : 278272
[09/15/2021-10:38:05] [I] PreProcess Time: 13.3244 ms
[09/15/2021-10:38:05] [I] inferenceDuration Time: 13.3018 ms
[09/15/2021-10:38:05] [I] PostProcessDuration Time: 7.13283 ms
&&&& PASSED TensorRT.sample_onnx_centerpoint # ./centerpoint

CarkusL commented 3 years ago

CUDA : 11.1 TensorRT : 7.2.2-1 GPU : RTX3090

xavidzo commented 3 years ago

Hi @CarkusL, can you give us the inference time for batch_size =1 of your TensorRT implementation including also the preprocess and postprocess?

CarkusL commented 3 years ago

Hi @CarkusL, can you give us the inference time for batch_size =1 of your TensorRT implementation including also the preprocess and postprocess?

[09/15/2021-10:38:05] [I] PreProcess Time: 13.3244 ms [09/15/2021-10:38:05] [I] inferenceDuration Time: 13.3018 ms [09/15/2021-10:38:05] [I] PostProcessDuration Time: 7.13283 ms

HaohaoNJU commented 2 years ago

Hi @CarkusL, can you give us the inference time for batch_size =1 of your TensorRT implementation including also the preprocess and postprocess?

[09/15/2021-10:38:05] [I] PreProcess Time: 13.3244 ms [09/15/2021-10:38:05] [I] inferenceDuration Time: 13.3018 ms [09/15/2021-10:38:05] [I] PostProcessDuration Time: 7.13283 ms

Hi@CarkusL, are you running this with fp32 or with fp16 ?

CarkusL commented 2 years ago

Hi @CarkusL, can you give us the inference time for batch_size =1 of your TensorRT implementation including also the preprocess and postprocess?

[09/15/2021-10:38:05] [I] PreProcess Time: 13.3244 ms [09/15/2021-10:38:05] [I] inferenceDuration Time: 13.3018 ms [09/15/2021-10:38:05] [I] PostProcessDuration Time: 7.13283 ms

Hi@CarkusL, are you running this with fp32 or with fp16 ?

FP32 on cpu

HaohaoNJU commented 2 years ago

@CarkusL Thanks for your great work, I wrote a new project based on your code, where computations of pre-process && post-process are done with Cuda, it runs much faster.

Here is the code : https://github.com/Abraham423/CenterPointTensorRT.git

CarkusL / CenterPoint

On which GPU did you test centerpoint TensorRT engine #5