INT8 support - Githubissues

kingardor commented 5 years ago

So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.

My environment: Ubuntu 18.10 Python 3.7.1 CUDA 10.0 cudNN 7.5.0 Tensorflow-gpu 1.13.1 TensorRT 5.0.2.6 GTX 1070

ardianumam commented 5 years ago

Have you calibrate the graph? In case you haven't, see this link (near end of the article).

kingardor commented 5 years ago

Thank you so much. Will give it a shot and update here :)

kingardor commented 5 years ago

OKay, I checked out the link. I will prepare a dataset for calibration. Meanwhile, you set the max batch size in create_inference_graph() method. How do we use this batch size during inference?

kingardor commented 5 years ago

Thanks buddy. Checked it out. Also, I wanted help regarding batch inference. You mentioned max batches in the create inference graph method(). How do I feed a batch of images to the model?

On Wed, 3 Apr 2019 at 06:32, Ardian Umam notifications@github.com wrote:

Have you calibrate the graph? In case you haven't, see this link https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/ (near end of the article).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ardianumam/Tensorflow-TensorRT/issues/9#issuecomment-479279897, or mute the thread https://github.com/notifications/unsubscribe-auth/AUN02WCrbnHAU7uMHKeLrXfZezVHERWkks5vc_3CgaJpZM4cY48k .

kingardor commented 5 years ago

Nevermind. I figured it out. I froze the graph again with the input tensor shape as [None, 416, 416, 3]. This allows for batch inference.

ardianumam / Tensorflow-TensorRT

INT8 support #9