ardianumam / Tensorflow-TensorRT

This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.
303 stars 110 forks source link

INT8 support #9

Closed kingardor closed 5 years ago

kingardor commented 5 years ago

So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.

My environment: Ubuntu 18.10 Python 3.7.1 CUDA 10.0 cudNN 7.5.0 Tensorflow-gpu 1.13.1 TensorRT 5.0.2.6 GTX 1070

ardianumam commented 5 years ago

Have you calibrate the graph? In case you haven't, see this link (near end of the article).

kingardor commented 5 years ago

Thank you so much. Will give it a shot and update here :)

kingardor commented 5 years ago

OKay, I checked out the link. I will prepare a dataset for calibration. Meanwhile, you set the max batch size in create_inference_graph() method. How do we use this batch size during inference?

kingardor commented 5 years ago

Thanks buddy. Checked it out. Also, I wanted help regarding batch inference. You mentioned max batches in the create inference graph method(). How do I feed a batch of images to the model?

On Wed, 3 Apr 2019 at 06:32, Ardian Umam notifications@github.com wrote:

Have you calibrate the graph? In case you haven't, see this link https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/ (near end of the article).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ardianumam/Tensorflow-TensorRT/issues/9#issuecomment-479279897, or mute the thread https://github.com/notifications/unsubscribe-auth/AUN02WCrbnHAU7uMHKeLrXfZezVHERWkks5vc_3CgaJpZM4cY48k .

kingardor commented 5 years ago

Nevermind. I figured it out. I froze the graph again with the input tensor shape as [None, 416, 416, 3]. This allows for batch inference.