NVIDIA-AI-IOT / tf_trt_models

TensorFlow models accelerated with NVIDIA TensorRT
BSD 3-Clause "New" or "Revised" License
686 stars 241 forks source link

TensortRT has no effect on ssd_mobilenet_v1_fpn_coco model #14

Closed Programmerwyl closed 6 years ago

Programmerwyl commented 6 years ago

When I use the ssd_mobilenet_v1_fpn_coco model to use tensorRT to accelerate,It doesn't work

retinanet mobile no tensorRT Iteration: 0.430 sec Iteration: 0.421 sec Iteration: 0.420 sec Iteration: 0.427 sec Iteration: 0.439 sec Iteration: 0.427 sec Iteration: 0.411 sec Iteration: 0.424 sec Iteration: 0.432 sec Iteration: 0.429 sec Iteration: 0.413 sec Iteration: 0.424 sec Iteration: 0.424 sec Iteration: 0.428 sec Iteration: 0.427 sec Iteration: 0.431 sec Iteration: 0.417 sec Iteration: 0.418 sec tensorRT 0.505087852478 0.504916906357 0.501970052719 0.505352973938 0.494786024094 0.498456954956 0.504287004471 0.50328707695 0.507141113281 0.499255895615 0.487679004669 0.489063978195 0.492527008057 0.503779172897 0.514405965805

log: retinanet v1 ('config_path', './data/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/pipeline.config') ('checkpoint_path', './data/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/model.ckpt') 2018-09-03 09:24:44.137510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero 2018-09-03 09:24:44.137784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005 pciBusID: 0000:00:00.0 totalMemory: 7.67GiB freeMemory: 4.45GiB 2018-09-03 09:24:44.137850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-09-03 09:24:47.792908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-03 09:24:47.793169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-09-03 09:24:47.793277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-09-03 09:24:47.793573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2913 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) Converted 333 variables to const ops. 2018-09-03 09:26:07.919932: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0 2018-09-03 09:26:16.036617: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 4 2018-09-03 09:26:16.057791: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Require 4 dimensional input. Got 0 const6" SKIPPING......( 108 nodes) 2018-09-03 09:26:16.064689: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Require 4 dimensional input. Got 0 const6" SKIPPING......( 108 nodes) 2018-09-03 09:26:16.837392: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:2 due to: "Invalid argument: Output node 'const6' is weights not tensor" SKIPPING......( 612 nodes) 2018-09-03 09:26:16.842941: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:3 due to: "Unimplemented: Require 4 dimensional input. Got 1 Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/zeros_like_47" SKIPPING......( 181 nodes) ['boxes', 'classes', 'scores'] 2018-09-03 09:27:47.320719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-09-03 09:27:47.320894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-03 09:27:47.320933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-09-03 09:27:47.320967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-09-03 09:27:47.321106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2913

thanks !

ghost commented 6 years ago

Thanks for sharing this issue.

Which version of TensorFlow, Jetpack are you using, and what parameters do you feed to trt.create_inference_graph?

Programmerwyl commented 6 years ago

The version of TensorFlow is 1.8 The version of Jetpack is 28 The parameters of feeding to trt.create_inference_graph are listed as follow trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph, outputs=output_names, max_batch_size=1, max_workspace_size_bytes=1 << 25, precision_mode='FP16', minimum_segment_size=50 )

ganliqiang commented 4 years ago

The version of TensorFlow is 1.8 The version of Jetpack is 28 The parameters of feeding to trt.create_inference_graph are listed as follow trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph, outputs=output_names, max_batch_size=1, max_workspace_size_bytes=1 << 25, precision_mode='FP16', minimum_segment_size=50 )

does this work for you ? in my experience, i use this method to speed up Unet model, but it does not worked ,the speed is not faster than with tensorflow, the so i wanna ask your final speed result when you use this method

captainst commented 4 years ago

@Programmerwyl Got an almost identical result on jetson NANO. Have you got any clue later on this issue ?