ardianumam / Tensorflow-TensorRT

This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.
303 stars 110 forks source link

don't improve the performance of models on GTX 1080ti #14

Open PythonImageDeveloper opened 5 years ago

PythonImageDeveloper commented 5 years ago

Hi, I optimized my trained model (1 class), the ssdlite_mobilenetv2 and ssd_resnet50, with TensorRT, but the performance did't improve significantly, I reach from 0.12 sec to 0.11 sec on GTX 1080 ti, why? I installed Tensorflow 1.12.0 , cuda 9 , TensorRT 4.0.1.6 packages on Ubuntu 16.04.

ardianumam commented 5 years ago

Hi, I optimized my trained model (1 class), the ssdlite_mobilenetv2 and ssd_resnet50, with TensorRT, but the performance did't improve significantly, I reach from 0.12 sec to 0.11 sec on GTX 1080 ti, why? I installed Tensorflow 1.12.0 , cuda 9 , TensorRT 4.0.1.6 packages on Ubuntu 16.04.

I also tried to perform TRT optimization several days ago to SSD MobileNet1 with 1 class. I got 45 FPS in Jetson TX2 for both before & after TRT optimization. My temporary conclusion is: (i) TRT might be less optimized for network like MobileNet, maybe due its separable convolution that already performs very small computation so that there is less space for optimization. (ii) When I use more classes (e.g., 80 classes in COCO), there is more difference after TRT optimization (TRT seems optimizing conv. operation for the output prediction which is proportional to the number of classes).

PythonImageDeveloper commented 5 years ago

Hi @ardianumam Thanks for your reply, I have some question, if possible answer these questions. thanks I installed these versions of the package on jetson tx2: TensorRT 5.0.2 , Tensorflow 1.13.1 , Cuda 10 Q1-I tested the ssdlit_mobilenetv2 with 1 class, this model achieves 22 FPS using Jetson TX2, is this result makes sense? or is there a room of improvement? Q2- This model took about 10 min for loading the frozen graph, why? Q3- maximum free of GPU memory is 5G out of 8G, why?

Please note which you're installed of packages versions.

Q4-How do you reach to 45 FPS, you are run the same codes of your GitHub to create the TFT model and execute of the model?

ardianumam commented 5 years ago

Hi,

Q1: What is your input dimension? And do you already set TX2 to the max performance? For 1 class, if you use 300x300 input dim (I use this dim), I think 22 FPS is too slow in TX2. Where do you get the ssdlit_mobilenetv2 model?

Q2: Only loading a pre-stored TRT_pb model is fast. Make sure you only loading the model, not building/optimizing a model.

Q3: TX2 uses 8GB shared memory, not only for GPU but also for the system memory, i.e., RAM.

Q4: I use this code and this model.

Btw, where do you get Tensorflow 1.13.1 for TX2? I try to google but can't find it yet.

PythonImageDeveloper commented 5 years ago

Hi, Thank you for good answers. Q1- My input dims are both 300 and 600. but I achieve to 22 FPS with 300 input size. You can get the ssdlite_mobilenetv2 model from model_zoo of tensorflow. Q2-I only loading the TRT_pb model. Please note which you're installed of packages versions. Thanks.

Are sure you get 45 FPS with ssd_mobilenetv1? In this page of nvidia, They achieve about 20 FPS (50 ms) with ssd_mobilenetv1 model. what's difference codes or package version this page with you?

ardianumam commented 5 years ago

Q2: All the packages & libraries versions I use are already provided in the README.md. Pls check.

This nvidia github model is for COCO (80 classes). I also got 20 FPS for that model. My 45 FPS model is for 1 class. This repo also reports even 50FPS for 5 classes using SSD MobileNet, and this articles reports > 30 FPS for SSD-GoogleNet in 20 classes (VOC dataset).

PythonImageDeveloper commented 5 years ago

Hi, Thanks, I installed jetpack 4.2, Is it not because of this version 4.2? It may not be compatible with other packages. I think because of this version my performance is very slow.