Tensorrt supported detection networks

HilmiK commented 6 years ago

Hi,

It is seen that tensorrt supports resnet in classification task. Does it also support the detection networks with a resnet backbone?

What are the exception modules which tensorrt does not support ?

Thanks in advance

ghost commented 6 years ago

It's possible that it would work, but we haven't tested it. Currently, the build_detection_graph method that we provide in this repository is tested to work only against the listed models.

That said, it is possible that for similar meta-architectures (SSD), configurations with different feature extractors would work. A list of feature extractors registered with the tensorflow/models repository is listed here.

https://github.com/tensorflow/models/blob/master/research/object_detection/builders/model_builder.py#L47

You would need to update the object detection configuration proto to select the desired feature extractor.

Theoretically, the TensorRT integration in TensorFlow should support any model, as the operations that are not supported by TensorRT are run in native TensorFlow. That said, there may be caveats.

Please let me know if you run into issues.

HilmiK commented 6 years ago

Thank you for answer. I will report here after I try faster-rcnn with different backbones.

bezero commented 6 years ago

I was able to convert "faster_rcnn_resnet101_coco"; however in order to be able to use it you should modify config file to use fixed input images. Modify line 4-8: keep_aspect_ratio_resizer { min_dimension: 600 max_dimension: 1024 } ==> fixed_shape_resizer { height: 600 width: 1024 } Use any dimension you like

jkjung-avt commented 6 years ago

@jaybdub-nv , @bezero , When I tried to convert "faster_rcnn_resnet50_coco" with TF-TRT on TX2, I met a few other issues. I wonder how you got around them. Any help/suggestion is highly appreciated.

TX2 ran out of memory, especially when I tried to load an image and do tf_sess.run(...). And the program just got killed.
Same issue as #11

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 115, in create_inference_graph
int(msg[0]))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout.

The following error, which seems to be solved by bezero's fix as shown above.

<log time omitted>: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::377, condition: isValidDims(dims)

The following error, which I think is because the 2nd stage classifier needs to handle input tensor of larger batch size (300).

<log time omitted>: F tensorflow/contrib/tensorrt/kernels/trt_engine_op.
cc:82] input tensor batch larger than max_batch_size: 1

bezero commented 6 years ago

@jkjung-avt I also had memory issues. I solved it by closing my browser, since it is using your memory resources (if possible close all idle applications that are using memory resources. If I am not wrong, scripts in this repo work with max_batch_size=1, so try to work with single images. For batch size >1 TX2 memory might not be sufficient.

jkjung-avt commented 6 years ago

@bezero Thanks for the reply. But closing the web browser and all other applications on TX2 did not solve the OOM issue for me. I also used single-image input for the faster_rcnn_resnet50. I had to reduce number of proposals/detections in the model config to some very small numbers to get around that...

tevisgehr commented 5 years ago

I am able to run faster_rcnn_resnet50_coco, which is included in the list of supported models, but I don't seem to be getting any speedup, which makes me skeptical that any subgraphs are being optimized at all.

In order to get it to run, I used the following command to build the graph (along with the other code included in the Jupyter example):

trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph, outputs=output_names, max_batch_size=1, max_workspace_size_bytes=1 << 25, precision_mode='FP16', minimum_segment_size=3, maximum_cached_engines=3 )

I am wondering if anyone has had success in speeding up any form of Faster R-CNN, and if so, could you share some insight into what settings need to be adjusted or how to go about getting the graph conversions to work correctly?

jkjung-avt commented 5 years ago

I shared my test results on Jetson TX2 developer forum before: https://devtalk.nvidia.com/default/topic/1037019/jetson-tx2/tensorflow-object-detection-and-image-classification-accelerated-for-nvidia-jetson/post/5288250/#5288250

Note that I had to reduce number of region proposals in the Faster RCNN models otherwise it runs too slowly. All code I used for testing could be found in my GitHub repository: https://github.com/jkjung-avt/tf_trt_models

inders commented 5 years ago

I am facing the following error while trying to get the FasterRCNN model on TensorRT. I have tried changing the resizer as per @bezero comment 6 but still doesn't help yet. Any pointers would be highly appreciated.

.cc:724] Can't determine the device, constructing an allocator at device 0 2018-12-13 09:49:37.182205: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::281, condition: isIndexedCHW(dims) && volume(dims) < MAX_TENSOR_SIZE 2018-12-13 09:49:37.182317: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 0, composed of 3 nodes failed: Invalid argument: Failed to create Input layer tensor InputPH_0 rank=-2. Skipping... 2018-12-13 09:49:37.182353: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0

atyshka commented 5 years ago

@bezero @jkjung-avt Did you run your faster rcnn models in the jupyter notebook? The notebook code works fine for me for the ssd models but if try the faster rcnn models I'm getting Engine buffer is full. buffer limit=1, current entries=1, requested batch=100. I'm using the exact notebook code with three modifications: 1: MODEL = 'faster_rcnn_resnet50_coco' 2: removed score_threshold=0.3 from build_detection_graph(... 3: changed to fixed_shape_resizer { height: 600 width: 1024 } in the config file

Can either of you reproduce this issue? I'm using Tensorflow 1.12 and TensorRT 5.0

jkjung-avt commented 5 years ago

I haven't managed to get the faster_rcnn_resnet50 model to work with tensorflow 1.12.0 and TensorRT. Previously I got it to work using tensorflow 1.8.0, with some tweaks. Details are all in my GitHub repository: https://github.com/jkjung-avt/tf_trt_models/blob/master/data/faster_rcnn_resnet50_egohands.config

CharlieXie commented 5 years ago

Hi @jkjung-avt, I used your configure file: https://github.com/jkjung-avt/tf_trt_models/blob/master/data/faster_rcnn_inception_v2_egohands.config to train a model on my dataset(class num 13) and then tried to convert it to TRT but still got the error: tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout. How to did u get rid of this error? Another issue I'm facing is that my trained FRCNN-inception-v2 checkpoint file (103.5MB) is about twice size of fined-tuned checkpoint file(53.3MB). Do you have any idea about this? Thanks in advance.

jkjung-avt commented 5 years ago

@CharlieXie, try setting 'remove_assert' to False. I recall that's how I got rid of the problem previously.

https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/tf_trt_models/detection.py#L108

xiaowenhe commented 5 years ago

@jkjung-avt , I use your tf_trt_models, when I run python3 camera_tf_trt.py --image --filename=xxx, --model=faster_rcnn_resnet50_coco --build .I met an error ,like: I do not know how to deal with it? Can you help me!

And when I run python3 camera_tf_trt.py --image --filename=xxx, --model=faster_rcnn_resnet50_coco . Do not build, no error,but detec result is not ideal,like:

jkjung-avt commented 5 years ago

@xiaowenhe The segmentation fault could be caused by "out of memory" issue. You could use 'tegrastat' to monitor JTX2 memory usage and try to confirm if that's the case.

As to the bad detection result by TF-TRT optimized faster_rcnn_resnet50_coco model, I'm not exactly sure what the problem is. There could be many causes, e.g.

mismatching tensorflow versions between training and inferencing,
TF-TRT does not optimize certain operations in the model correctly,
...

xiaowenhe commented 5 years ago

@jkjung-avt ,thank you! But I bo not use TX2,. I want to test it in other first and then use tx2. And GPU like :

From the pic,only 5285M used!

hoangtuanvu commented 5 years ago

I can not force performance by using optimized TensorRT. Can someone tell my why? After optimizing the frozen graph, I get bigger model ???

bezero commented 5 years ago

@hoangtuanvu What do you mean by not being able to optimize? TensorRT optimizes your frozen model for inference, which does not mean that you get smaller model. Did you compare inference time before and after TensorRT optimization?

hoangtuanvu commented 5 years ago

@bezero I used TensorRT to optimize the frozen graph, but I did not get better speed for inference. I am currently working on person detection.

TomKomar commented 5 years ago

I'm having a similar situation to @atyshka - no improvement whatsoever. Only difference after generating an 'optimized' graph is that with every frame I'm getting a warning "Engine buffer is full". Has anyone figured out how to deal with this? Xavier TF1.12+TRT5

zhucheng725 commented 5 years ago

Although I ran the detection demo using ssd_mobilenet_v1_coco.pb, I found that if I used the TP16 in trt.create_inference_graph(), and the result shows that the benchmark is about 0.041013 seconds and I used the INT8, the result shows that the benchmark is about 0.383557 seconds. Why will INT8 slower than TP16

ashispapu commented 5 years ago

@hoangtuanvu I am facing an issue while running the inference on a tensorflow object detection model(242MB). I have TF 1.13 and TensorRT 5.1.2 . Below is the log details. 2019-06-03 15:05:21.432164: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node resnet_v1_101/conv1/TRTEngineOp_123 added for segment 123 consisting of 2 nodes succeeded. 2019-06-03 15:05:21.432437: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node rpn_proposals/softmax/TRTEngineOp_124 added for segment 124 consisting of 3 nodes succeeded. 2019-06-03 15:05:22.384389: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:616] Optimization results for grappler item: tf_graph 2019-06-03 15:05:22.384595: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 2014 nodes (-599), 2353 edges (-637), time = 4514.9751ms. 2019-06-03 15:05:22.384653: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] layout: Graph size after: 2063 nodes (49), 2422 edges (69), time = 462.632ms. 2019-06-03 15:05:22.384702: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 2059 nodes (-4), 2422 edges (0), time = 908.786ms. 2019-06-03 15:05:22.384748: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] TensorRTOptimizer: Graph size after: 1653 nodes (-406), 2000 edges (-422), time = 57351.3477ms. time(s) (trt_conversion): 72.7292 graph_size(MB)(native_tf): 230.8 graph_size(MB)(trt): 493.0 num_nodes(trt_only): 125 2019-06-03 15:05:49.531006: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for TRTEngineOp_0 with batch size 720 2019-06-03 15:05:49.543881: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output. 2019-06-03 15:05:55.369363: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/TRTEngineOp_23 with batch size 1 2019-06-03 15:05:55.837386: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/conv1/TRTEngineOp_123 with batch size 1 2019-06-03 15:05:57.403776: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/TRTEngineOp_24 with batch size 1 2019-06-03 15:06:10.529445: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_1/bottleneck_v1/TRTEngineOp_25 with batch size 1 2019-06-03 15:06:13.628441: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_1/bottleneck_v1/TRTEngineOp_26 with batch size 1 2019-06-03 15:06:20.675574: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/TRTEngineOp_27 with batch size 1 2019-06-03 15:06:25.591558: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_2/bottleneck_v1/TRTEngineOp_28 with batch size 1 2019-06-03 15:06:28.377901: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_2/bottleneck_v1/TRTEngineOp_29 with batch size 1 2019-06-03 15:06:35.168358: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/TRTEngineOp_31 with batch size 1 Killed

======================================================================== when i run dmesg --follow to check the process details.

[10663.666441] [12648] 1000 12648 6243614 1507814 3582 12 0 0 python3 [10663.666444] Out of memory: Kill process 12648 (python3) score 751 or sacrifice child [10663.674368] Killed process 12648 (python3) total-vm:24974456kB, anon-rss:5768628kB, file-rss:262628kB, shmem-rss:0kB [10664.011176] oom_reaper: reaped process 12648 (python3), now anon-rss:0kB, file-rss:262708kB, shmem-rss:0kB

Any suggestion or feedback is appreciated.

VincentChong123 commented 5 years ago

Hi @zhucheng725

Why will INT8 slower than TP16

Do you have any update on this?

Thanks

srkm009 commented 5 years ago

Hello, Did anyone manage to resolve this issue? or is it still an issue from the TF-TRT? I see the same issue with TF2.0 as well.

zhucheng725 commented 5 years ago

Hi @zhucheng725

Why will INT8 slower than TP16

Do you have any update on this?

Thanks

Not yet

spurani commented 3 years ago

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

Some-random commented 3 years ago

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway!

spurani commented 3 years ago

If I am not wrong the error states that the system does not have enough memory to run faster_rcnn_inception_v2 model

Get Outlook for Androidhttps://aka.ms/ghei36

From: Bob_JIANG @.***> Sent: Tuesday, March 30, 2021, 12:04 p.m. To: NVIDIA-AI-IOT/tf_trt_models Cc: spurani; Comment Subject: Re: [NVIDIA-AI-IOT/tf_trt_models] Tensorrt supported detection networks (#6)

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSup

pression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA-AI-IOT/tf_trt_models/issues/6#issuecomment-810386023, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2BYGE7CIWHR774SEZXBSLTGHZA3ANCNFSM4FLQTZPA.

Some-random commented 3 years ago

If I am not wrong the error states that the system does not have enough memory to run faster_rcnn_inception_v2 model Get Outlook for Androidhttps://aka.ms/ghei36 … ____ From: Bob_JIANG @.***> Sent: Tuesday, March 30, 2021, 12:04 p.m. To: NVIDIA-AI-IOT/tf_trt_models Cc: spurani; Comment Subject: Re: [NVIDIA-AI-IOT/tf_trt_models] Tensorrt supported detection networks (#6) I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSup pression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''. I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway! — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2BYGE7CIWHR774SEZXBSLTGHZA3ANCNFSM4FLQTZPA.

Thanks for the quick answer! I'm running a different model using TRT and my memory is normal during execution... Do you know the meaning of 'has inputs from different frames' in the error message?

TClan8023 commented 1 year ago

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

Hi, I'm using TF-TRT on windows10, with tf_gpu =2.10.0 and tensorrt = 7.2.3 based on cuda 11.2 and cudnn 8.1.0. I have met the same error while building TRT engine for inference. Do you know how to deal with it? Thanks a lot for your reply.

NVIDIA-AI-IOT / tf_trt_models

Tensorrt supported detection networks #6