I found that using tensorrt for inference takes more time than using tensorflow directly on GPU

NVIDIA / tensorrt-laboratory

Explore the Capabilities of the TensorRT Platform

BSD 3-Clause "New" or "Revised" License

261 stars 50 forks source link

The memory amount of memory is not likely to be the problem. TF will gobble up all the memory on the GPU and own it. It uses internal allocators for its work. TenorRT is actually very efficient in its memory usage and gives the user very explicit control.

The likely problem is either: 1) data loading 2) synchronous execution of the TRT engine 3) only executing 1 IExecutionContext instead of multiple.

It's hard to know without seeing the details of your code.

I'd also advise you to use these resources as well:

NVIDIA's DeepStream is an integrated video -> tensorrt workflow. This seems like it's best suited to solve your problem: https://developer.nvidia.com/deepstream-sdk

Additionally, the best place to get support for TensorRT questions is from the Developer Forum: https://devtalk.nvidia.com/default/board/304/tensorrt/

NVIDIA / tensorrt-laboratory

I found that using tensorrt for inference takes more time than using tensorflow directly on GPU #24