NVIDIA-AI-IOT / tf_trt_models

TensorFlow models accelerated with NVIDIA TensorRT
BSD 3-Clause "New" or "Revised" License
682 stars 245 forks source link

Why the id of class will change when the tensorRT is used #21

Closed Programmerwyl closed 5 years ago

Programmerwyl commented 5 years ago

For the same picture and same model,When inference with tensorRT, the classification id of the output is different from that without tensorRT。 By comparison, using tensorrt, the ids of several classifications with high confidence would be one less。 Why does this happen? In the sample code you gave, by looking at the mscoco_label_map.pbtxt file, the id in the sample is 1 less

ghost commented 5 years ago

Thanks for raising this issue.

For the non-tensorrt graph, are you using the downloaded frozen graph directly? Or the frozen graph returned by build_detection_graph?

The index offset likely has to do with how we re-construct the graph from the checkpoint file and configuration file, which is independent of TensorRT.

It is probably still worth changing this to be consistent with the TensorFlow object detection API.

John

Programmerwyl commented 5 years ago

Hi @jaybdub-nv thank you very much for your reply. I use the frozen graph returned by build_detection_graph and I does not change the configuration file. When I run the sample code you gave for detecting objects, the id shown by the dog is problematic. I just ran the code and didn't change anything. I don't know why the id changes. If you have any ideas, please tell me to try. Thanks !

ghost commented 5 years ago

We have a branch called 'improved_model_support' that is currently experimental and subject to change.

This branch introduces some changes, one of which is ID's that are consistent with the TensorFlow object detection API.

You can use this branch by running the following from this project's root directory

git fetch
git checkout improved_model_support
python setup.py install --user

Then, you can launch jupyter-notebook and run the sample. Please note one major difference in this branch is the names are no long 'scores', 'boxes', etc. but 'detection_scores', 'detection_boxes', etc.

This is consistent with the TensorFlow object detection API.

If you try this please let me know if it works or doesn't work, either way.

Best, John

Programmerwyl commented 5 years ago

Hi John I'm glad to tell you,it works. I have a few questions. If I want to generate graph, but not through object-detection code What if I create graph by this:

PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb' detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='')

Why it does work?

If my project is not based on object-detection, I want to use tensorRT. What should I do

Programmerwyl commented 5 years ago

Besides if I use the model of ssd_resnet_50_fpn_coco, TensorRT doesnot work. logs are listed as follows :

--2018-10-11 06:00:35-- http://download.tensorflow.org/models/object_detection/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz Resolving download.tensorflow.org (download.tensorflow.org)... 216.58.199.16, 2404:6800:4005:806::2010 Connecting to download.tensorflow.org (download.tensorflow.org)|216.58.199.16|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 366947246 (350M) [application/x-tar] Saving to: ‘data/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz’

data/ssd_resnet50_v 100%[===================>] 349.95M 8.36MB/s in 39s

2018-10-11 06:01:15 (9.09 MB/s) - ‘data/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz’ saved [366947246/366947246]

2018-10-11 06:01:20.677294: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero 2018-10-11 06:01:20.677419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005 pciBusID: 0000:00:00.0 totalMemory: 7.67GiB freeMemory: 1.34GiB 2018-10-11 06:01:20.677465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:01:22.991412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:01:22.991537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:01:22.991579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:01:22.991805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 630 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) WARNING:tensorflow:From /home/nvidia/Downloads/object_google_config/object_detection/exporter.py:356: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2018-10-11 06:02:32.783282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:02:32.783443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:02:32.783482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:02:32.783519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:02:32.783625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 630 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) 2018-10-11 06:03:24.097268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:03:24.097413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:03:24.097472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:03:24.097513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:03:24.097617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 630 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) Converted 463 variables to const ops. 2018-10-11 06:03:40.127623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:03:40.127773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:03:40.127805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:03:40.127843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:03:40.127958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 630 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'] 2018-10-11 06:04:25.591708: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0 2018-10-11 06:04:34.058507: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 4 2018-10-11 06:04:34.073274: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Require 4 dimensional input. Got 1 WeightSharedConvolutionalBoxPredictor/BoxPredictor/biases/read" SKIPPING......( 108 nodes) 2018-10-11 06:04:34.077217: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Require 4 dimensional input. Got 0 const6" SKIPPING......( 108 nodes) 2018-10-11 06:04:35.726973: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:2 due to: "Invalid argument: Output node 'const6' is weights not tensor" SKIPPING......( 755 nodes) 2018-10-11 06:04:35.732197: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:3 due to: "Unimplemented: Require 4 dimensional input. Got 1 Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/zeros_like_86" SKIPPING......( 181 nodes) 2018-10-11 06:07:11.388434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:07:11.388658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:07:11.388698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:07:11.388731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:07:11.388956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 630 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

Available Sensor modes : 3840 x 2160 FR=60.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1920 x 1080 FR=60.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1280 x 720 FR=60.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1280 x 540 FR=240.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected sensorModeIndex = 0 WxH = 3840x2160 FrameRate = 60.000000 ...

2018-10-11 06:08:05.001218: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:05.072066: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:05.076990: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 450.56MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:05.198384: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:09.185485: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:09.422405: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:09.528466: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 548.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:09.780213: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 459.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:09.882930: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2018-10-11 06:08:09.965715: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 312.49MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

FPS: 0.0 FPS: 1.4 FPS: 1.6 FPS: 1.6 FPS: 1.6 FPS: 1.6 FPS: 1.5 FPS: 1.6 FPS: 1.6 FPS: 1.6 FPS: 1.6 FPS: 1.6 FPS: 1.6

Programmerwyl commented 5 years ago

If I use the model of ssd_mobilenet_v1_coco,logs are listed as follows : 2018-10-11 06:33:59.885830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero 2018-10-11 06:33:59.886006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005 pciBusID: 0000:00:00.0 totalMemory: 7.67GiB freeMemory: 5.25GiB 2018-10-11 06:33:59.886064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:34:02.914677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:34:02.914788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:34:02.914818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:34:02.915365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4241 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) ^AWARNING:tensorflow:From /home/nvidia/Downloads/object_google_config/object_detection/exporter.py:356: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2018-10-11 06:34:53.343754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:34:53.343909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:34:53.343944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:34:53.343971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:34:53.344071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4241 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) 2018-10-11 06:35:28.654087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:35:28.654243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:35:28.654276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:35:28.654305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:35:28.654400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4241 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) Converted 199 variables to const ops. 2018-10-11 06:35:39.512329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:35:39.512474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:35:39.512507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:35:39.512535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:35:39.512635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4241 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'] 2018-10-11 06:36:10.710508: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0 2018-10-11 06:36:16.301978: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 2 2018-10-11 06:36:16.565157: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 1 max workspace size= 23679062 2018-10-11 06:36:16.565250: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2664] Using FP16 precision mode 2018-10-11 06:36:16.565278: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine 2018-10-11 06:37:01.023211: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network 2018-10-11 06:37:01.201921: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2676] Serialized engine 2018-10-11 06:37:01.211610: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2684] finished engine my_trt_op0 containing 434 nodes 2018-10-11 06:37:01.211758: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2704] Finished op preparation 2018-10-11 06:37:01.241846: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2712] OK finished op building 2018-10-11 06:37:01.254009: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Require 4 dimensional input. Got 1 Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/zeros_like_85" SKIPPING......( 181 nodes) 2018-10-11 06:37:04.845186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-11 06:37:04.845314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-11 06:37:04.845344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-11 06:37:04.845369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-11 06:37:04.845474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4241 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

Available Sensor modes : 3840 x 2160 FR=60.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1920 x 1080 FR=60.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1280 x 720 FR=60.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10 1280 x 540 FR=240.000000 CF=0x1009208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10

NvCameraSrc: Trying To Set Default Camera Resolution. Selected sensorModeIndex = 0 WxH = 3840x2160 FrameRate = 60.000000 ...

FPS: 0.2 FPS: 21.6 FPS: 21.4 FPS: 21.3 FPS: 21.4 FPS: 21.6 FPS: 21.6 FPS: 21.7 FPS: 21.6

By comparison, ssd_mobilenet_v1_coco model will have the following process: starting build engine Built network Serialized engine finished engine my_trt_op0 containing 434 nodes Finished op preparation OK finished op building

If you're refactoring code, consider this part as well Look forward to your update. Thanks !

ghost commented 5 years ago

Thanks for sharing.

Currently, in the experimental branch we have verified that the following models work

ssd_mobilenet_v1_coco ssd_mobilenet_v2_coco ssd_inception_v2_coco

We are tracking issues with other models and TensorRT integration.

To answer your first question, TensorRT integration is intended to work with any tensorflow model (not just specific to a task like object detection). However, there may be some issues in practice. To use TensorRT integration for a different task you need to

  1. Obtain a frozen graph for the model. There is plenty of external documentation on how to do this.
  2. Determine the names of output nodes in your model. If you define your model, you can set these during model construction. If you're using an existing graph, you can determine this by visualization with TensorBoard.

Hope this helps, John

Programmerwyl commented 5 years ago

Hi John ,Thank you very much for your reply.