ardianumam / Tensorflow-TensorRT

This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.
303 stars 110 forks source link

How to implement TensorRT on custom number of classes #19

Closed Mahaaveer closed 5 years ago

Mahaaveer commented 5 years ago

Hi, I have a weights file trained on darknet for two classes. I used your repo: https://github.com/ardianumam/tensorflow-yolov3 to convert my weights file to frozen graph. Since num_classes is read from coco.names, i changed the coco.names to the classes i have. I successfully generated frozen model and copied it over to your TensorRT repo to generate tensorrt model. It created the TensorRT model successfully. When i run the inference block of the code, it returns the following error:

2019-07-15 13:15:25.604401: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 13:15:25.719409: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 13:15:25.719738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.2405 pciBusID: 0000:02:00.0 totalMemory: 3.94GiB freeMemory: 3.70GiB 2019-07-15 13:15:25.719751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0 2019-07-15 13:15:26.679452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 13:15:26.679482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 2019-07-15 13:15:26.679488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N 2019-07-15 13:15:26.679607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2017 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:02:00.0, compute capability: 5.2) Segmentation fault (core dumped)

Since my number of classes are different from coco's 80, should i be following any additional steps ?
Please excuse my limited knowledge on tensorflow as i am not from computer science background.

Thanks.

Mahaaveer commented 5 years ago

I am closing this. As it turns out, it was due to cudnn version mismatch I solved this by upgrading cudnn 7.0.X to 7.5.X

Thanks.