MuhammadAsadJaved commented 4 years ago

Hi, I am using the TensorFlow version of yolov3, it's not the same as the darknet, it used two yolov3 for feature extraction from visual and infrared images and then perform feature fusion and finally object detection.

My project can run in GTX 1080 Ti at about 40 FPS , but in Xavier NX the speed is 2 FPS. Now my goal is to convert this TensorFlow model to onnx and trt engine to speed up in Xavier NX. I have weights in both .ckpt and .pb format. 1-What steps I should follow? I am really confused, there are too many confusing articles about TF-TRT and TensorRT but no clear guidelines. 2-Do I need to use TF-TRT or TensorRT? 3-Can you give me a road map for this task? Is it helpful to speed up the detection in the Xavier NX?

I have spent about 15 days for trying by myself but failed, so finally I decided to post my question here. I hope you will guide me in this regard.

Thanks.

pranavm-nvidia commented 4 years ago

Provided a short overview of the two approaches below, hope it helps!

TF -> ONNX -> TRT

This is the approach we typically suggest for best performance. It involves exporting your TF model to ONNX, and then importing the ONNX model into TensorRT.

Steps:

Convert your TF frozen graph or checkpoint to ONNX. You can refer to the tf2onnx README.
Import the ONNX model into TensorRT. You can refer to the developer guide for details.
Build an engine and run inference.

Advantages:

No dependency on TF, meaning lower memory requirements
More performant than TF-TRT in some cases

TF-TRT

The other option is to use TF's TensorRT integration. This approach partitions the graph and runs supported subgraphs in TRT (I believe it's implemented as a Grappler optimization pass). This can be convenient if you're primarily developing in a TF-focused environment, but at the expense of performance.

Steps:

Use the TrtGraphConverter optimization pass.
Deploy as a normal TF model.

Advantages:

Allows you to deploy like a regular TF model, meaning, for example, that you can use TF serving.

MuhammadAsadJaved commented 4 years ago

@pranavm-nvidia Got it. Thank you for the explanation.

MuhammadAsadJaved commented 4 years ago

Hi, I am able to convert TF frozen_graph.pb to .onnx but having problems in building engine .onnx model to .trt.

I am trying this command.

onnx2trt modelIn/yolov3_coco.onnx -o modelOut/yolov3_coco.trt

and got the following errors. webwxgetmsgimg

So Can you give some guidelines to handle this error?

I have attached my .pb and .onnx models here. https://drive.google.com/drive/folders/1uoCqNCMwNvrgW6TQ3Ox-3w_GM7Q8div5?usp=sharing

pranavm-nvidia commented 4 years ago

@MuhammadAsadJaved Which version of TensorRT are you using? I think this bug was fixed in 7.1. Can you try 7.1 or newer?

MuhammadAsadJaved commented 4 years ago

I am using 5.0, OK. I will try with 7.1 as well. Thank you.

MuhammadAsadJaved commented 4 years ago

@pranavm-nvidia Thank you. It can convert with TensorRT 7.1. But with TensorRT 7.1 it can convert only when i use --fp16. Without --fp16 it can not without --fp16. I have opened an issue here.

https://github.com/onnx/onnx-tensorrt/issues/549

onnx2trt and trtexec have some problems. trtexec works with --fp16 and onn2trt works with -d16. Is it same? What is this parameter for?

pranavm-nvidia commented 3 years ago

--fp16 enables float16 tactics. It shouldn't affect parsing though. Are you sure that's the only thing you're changing?

MuhammadAsadJaved commented 3 years ago

--fp16 enables float16 tactics. It shouldn't affect parsing though. Are you sure that's the only thing you're changing?

Yes, only --fp16. I want to know what other fp ranges we can use? which one is best for speed and which one is best for accuracy?

nvpohanh commented 2 years ago

@MuhammadAsadJaved Could you try TRT 8.2/8.4 and see if the issue still exists? If it does, we will debug it. Thanks

nvpohanh commented 2 years ago

Closing due to >14 days without activity. Please feel free to reopen if the issue still exists in TRT 8.4. Thanks

NVIDIA / TensorRT

Custom Tensorflow-Yolov3 to onnx and then to trt.engine #791

TF -> ONNX -> TRT

TF-TRT