ardianumam / Tensorflow-TensorRT

This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.
303 stars 110 forks source link

read_pb is slow #2

Open fugjo16 opened 5 years ago

fugjo16 commented 5 years ago

Dear author,

It's a great project, and the result is good!

but when I ran yolov3 with TensorRT on TX2, it took a long time (about 10~20 mins) to run read_pb_return_tensors(). Is this right? I'm wondering whether I did something wrong ...

Thanks

ardianumam commented 5 years ago

Hi,

Do you: (i) run all the block 2 code of this code file, or (ii) only run function read_pb_graph("./model/YOLOv3/yolov3_gpu_nms.pb? If (i), yes, it takes longer time since you also perform TensorRT optimization. But, later after you store the trt_model.pb, you can just do similar to (ii) to call your stored trt_model.pb, and it only takes few seconds (also depends on your GPU). By the way, can you provide how much improvement in term of FPS after TRT optimization? And also what GPU you use. I am also curious about that.

fugjo16 commented 5 years ago

Hi @ardianumam ,

The situation is (ii), it will need about 15 mins to load the model, and I run this code on Jetson TX2, but with 3rd party carrier board. After the loading finished, the fps can be about 9 fps, and about 4 fps without TensorRT optimization. I think maybe the problem is caused by the 3rd party carrier board or different version of packages, I'll check it. Thanks for your reply.

ardianumam commented 5 years ago

@fugjo16 : Do you convert the frozen_model.pb to TRT_model.pb in Desktop, then, you use it in Jetson TX2? Because I ever do the similar, and yes, it takes very long time even only to load the TRT_model.pb. And actually, such workflow is not proper, since TensorRT optimization generates an optimized model specifically for the machine we used to run the TensorRT optimization.

If not, I wonder that you can convert frozen_model.pb to TRT_model.pb in Jetson TX2, cz I ever try it several times and it always runs out memory. -.-

fugjo16 commented 5 years ago

@ardianumam. No, I convert to TRT_model.pb on the TX2, I use swap to get some more memory, as below. It's for CPU memory, but it still helped. https://devtalk.nvidia.com/default/topic/1025939/jetson-tx2/when-i-run-a-tensorflow-model-there-is-not-enough-memory-what-shoud-i-do-/ Maybe this is why I need so much time to load TRT_model ...

ardianumam commented 5 years ago

@fugjo16 : I just knew about that. I'll try later in my TX2 too, and update here soon. Thanks. Yes, probably that's the cause.

fugjo16 commented 5 years ago

@ardianumam Thanks! this problem really confuse me a lot.

ardianumam commented 5 years ago

Hi @fugjo16 : I just tried in my TX2, and yes, it took about 15 minutes to only read the <tensorrt_model.pb>, meanwhile reading the native tensorflow model <frozen_model>.pb needs only 5 seconds. lol. Maybe it due to the swap memory use when performing TensorRT optimization. I posted to NVIDIA forum too, hope someone replies. Or do you plan to, for example, reduce the YOLOv3 architecture so that we can perform tensorrt optimization in TX2 without making swap memory?

fugjo16 commented 5 years ago

Hi @ardianumam: Thanks a lot! hope someone will answer it. lol. Yes, I think this method will work, I will try it! Thanks :D

filipski commented 5 years ago

I'd rather say you're hit by the protobuf version/backend. Check: https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/

and start with: export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp before running your code. If that doesn't help - update protobuf. I rebuilt it from sources.

ardianumam commented 5 years ago

@filipski : thanks for the info. I'll give it a try.

fugjo16 commented 5 years ago

I tested with this blog's script. it's easy to modify, and it works for me. https://jkjung-avt.github.io/tf-trt-revisited/

MuhammadAsadJaved commented 3 years ago

@fugjo16 @ardianumam I have a yolov3 Tensorflow model in both ckpts and .pb format. My model can run in GTX 1080 Ti at 37 FPS . Now I want to run in Xavier NX but model is very slow. about 2 FPS. How I can optimize this model using trt to make it faster and run in Xavier NX? how I can convert .pb model to .trt engine?