NVIDIA-AI-IOT / tf_trt_models

TensorFlow models accelerated with NVIDIA TensorRT
BSD 3-Clause "New" or "Revised" License
686 stars 241 forks source link

TF-TRT vs UFF-TensorRT #68

Open PythonImageDeveloper opened 4 years ago

PythonImageDeveloper commented 4 years ago

I found that we can optimize the Tensorflow model in several ways. If I am mistaken, please tell me.

1- Using TF-TRT This API developer by tensorflow and integreted TensoRT to Tensorflow and this API called as :

from tensorflow.python.compiler.tensorrt import trt_convert as trt

This API can be applied to any tensorflow models (new and old version models) without any converting error, because If this API don't support any new layers, don't consider these layers for TensorRT engines and these layers remain for Tensorflow engine and run on Tensorflow. right?

2- Using TensorRT, This API by developed by NVIDA and is independent of Tenorflow library (Not integrated to Tensorflow), and this API called as:

import tensorrt as trt

If we want to use this api, first, we must converting the tensorflow graph to UFF using uff-convertor and then parse the UFF graph to this API. In this case, If the Tensorflow graph have unsupported layers we must use plugin or custom code for these layers, right?

I have some question about the two cases above: 3- I convert the ssd_mobilenet_v2 using two cases, In the case 1, I achieve slight improvement in speed but in the case 2, I achieve more improvement, why? My opinion is that, In the case 1, The API only consider converting the precision (FP32 to FP16) and merging the possible layers together, But in the case 2, the graph is clean by UFF such as remove any redundant nodes like Asserts and Identity and then converted to tensorrt graph, right?

4- when we convert the trained model files like .ckpt and .meta, ... to frozen inference graph(.pb file), These layers don't remove from graph? only loss states and optimizer states , ... are removed?

yuqcraft commented 4 years ago

@PythonImageDeveloper

For TF-TRT, what I find out is that after converting the graph with "fp16" just like https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example. The resulted model is still fp32(DT_FLOAT instead of DT_HALF). I don't know if that's the reason why tf-trt is running much slower than trt?

gpqls commented 4 years ago

@PythonImageDeveloper

How did you run TensorRT using the UFF converter?I'm having trouble with continuous errors or poor translation.

PythonImageDeveloper commented 4 years ago

Hi It's likely your model have some new operations that TensoRT currently doesn't support. You have to define some custom plugin for that nodes or you have to train model with old version of tensorflow or clone old git version of TFOD API, if you use.