Closed wizpig closed 1 year ago
I've found out that removing the LSTM block from the network allowed me to convert to TensorRT! Question is now, what goes wrong with LSTM during TensorRT conversion? To my understanding it is supported. Right?
This issue may have to do with a warning, I've got while saving my model during train time in Keras:
WARNING:absl:Found untraced functions such as lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses, lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses, lstm_cell_layer_call_and_return_conditional_losses while saving (showing 5 of 5).
These functions will not be directly callable after loading.
I've also tried to convert my model via TF-TRT and that does work even though I'm using an LSTM layer... However TF-TRT has some significant drawbacks, as I'm aiming for a Jetson Xavier as target platform.
Does anyone know about a working example where a Keras/tensorflow model with LSTM layers is converted to TensorRT?
Hello @wizpig , could you provide an onnx file that we can debug? thanks!
@ttyio I have the same issue: converting LSTM+Dense TF to TRT triggers a "Myelin Error in addNodeToMyelinGraph: operation not supported within a loop body". Does the "triaged" status means that a fix is planned? Thanks.
Hello @lbortho , tensorrt has different coverage for op inside a loop and outside a loop. Could you provide a repro that we can use to debug which op is missing when inside a loop?
also I used triaged
tag to track the issues that I have seen, still need your repro for further triage, thanks!
Hello @wizpig , could you provide an onnx file that we can debug? thanks!
Hey ttyo,
sorry for the late reply. Here you have an example ONNX file that produces the same error for me.
20210428_163309_EXAMPLE_MODEL_CLASS_LSTM_skipframe_0_batch_0002.onnx.zip
Hi @ttyio , the steps to reproduce the issue is simple. First here is my dependencies. JetPack 4.5.1 Python 3.6.9 TensorRT 7.1.3 TensorFlow 2.4.0 tf2onnx 1.8.4
Then the code
import tensorflow as tf
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
import tf2onnx
input= Input(shape = (60,8), dtype=tf.float32)
lstm = tf.keras.layers.LSTM(1)(input)
model = Model(inputs=input, outputs=lstm)
spec = (tf.TensorSpec((None, 60, 8), tf.float32),)
tf2onnx.convert.from_keras(model, output_path='model_dummy.onnx', input_signature=spec)`
I use trtexec from TensorRT OSS 7.1.3 to parse/verify the onnx:
trtexec --optShapes='args_0':1x60x8 --onnx=model_dummy.onnx
Which results in
[04/28/2021-14:06:45] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-14:06:45] [I] [TRT]
[04/28/2021-14:06:45] [I] [TRT] --------------- Layers running on DLA:
[04/28/2021-14:06:45] [I] [TRT]
[04/28/2021-14:06:45] [I] [TRT] --------------- Layers running on GPU:
[04/28/2021-14:06:45] [I] [TRT] {(Unnamed Layer* 0) [Constant],(Unnamed Layer* 1) [Constant],while_cond_567_while/Less,(Unnamed Layer* 14) [Constant],(Unnamed Layer* 17) [Constant],(Unnamed Layer* 23) [Constant],(Unnamed Layer* 25) [Constant],(Unnamed Layer* 27) [Constant],(Unnamed Layer* 29) [Constant],model/lstm/PartitionedCall/transpose,(Unnamed Layer* 9) [Constant] + (Unnamed Layer* 10) [Shuffle],model/lstm/zeros_1,(Unnamed Layer* 15) [TripLimit],model/lstm/PartitionedCall/while_loop,(Unnamed Layer* 24) [Recurrence],(Unnamed Layer* 26) [Recurrence],(Unnamed Layer* 28) [Recurrence],(Unnamed Layer* 16) [TripLimit],while/add_2,while/TensorArrayV2Read/TensorListGetItem,(Unnamed Layer* 35) [Shuffle],while/MatMul,(Unnamed Layer* 19) [Recurrence],(Unnamed Layer* 20) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/Identity_4,(Unnamed Layer* 66) [LoopOutput],model/lstm/PartitionedCall/strided_slice_2,model/lstm/PartitionedCall/strided_slice_2__23},
[04/28/2021-14:06:46] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/28/2021-14:06:46] [E] [TRT] ../builder/myelin/codeGenerator.cpp (112) - Myelin Error in addNodeToMyelinGraph: 0 (while/TensorArrayV2Read/TensorListGetItem{(Unnamed Layer* 0) [Constant],(Unnamed Layer* 1) [Constant],while_cond_567_while/Less,(Unnamed Layer* 14) [Constant],(Unnamed Layer* 17) [Constant],(Unnamed Layer* 23) [Constant],(Unnamed Layer* 25) [Constant],(Unnamed Layer* 27) [Constant],(Unnamed Layer* 29) [Constant],model/lstm/PartitionedCall/transpose,(Unnamed Layer* 9) [Constant] + (Unnamed Layer* 10) [Shuffle],model/lstm/zeros_1,(Unnamed Layer* 15) [TripLimit],model/lstm/PartitionedCall/while_loop,(Unnamed Layer* 24) [Recurrence],(Unnamed Layer* 26) [Recurrence],(Unnamed Layer* 28) [Recurrence],(Unnamed Layer* 16) [TripLimit],while/add_2,while/TensorArrayV2Read/TensorListGetItem,(Unnamed Layer* 35) [Shuffle],while/MatMul,(Unnamed Layer* 19) [Recurrence],(Unnamed Layer* 20) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/Identity_4,(Unnamed Layer* 66) [LoopOutput],model/lstm/PartitionedCall/strided_slice_2,model/lstm/PartitionedCall/strided_slice_2__23} operation not supported within a loop body.)
[04/28/2021-14:06:46] [E] [TRT] ../builder/myelin/codeGenerator.cpp (112) - Myelin Error in addNodeToMyelinGraph: 0 ()
[04/28/2021-14:06:46] [E] Engine creation failed
[04/28/2021-14:06:46] [E] Engine set up failed
Finally here is the ONNX model:
Hi @ttyio, I tried the same test with TensorRT 7.2.3.4 GA (CUDA 11.0 and Cudnn 8.1) and TensorFlow 2.4.1 on a x86 18.04 Ubuntu host and obtained the same error. Have you been able to reproduce the issue on your side?
Hello @lbortho , we have some improvement in the 8.0, but still cannot work with your model because we do not support gather
in the loop, have created internal request to track this, sorry.
@ttyio Does that mean, that LSTMs are in general not supported at this point?
Nevertheless, thanks for your effort.
Hello @wizpig , the ONNX lstm
is supported, there is no gather
in the loop, see the implementation here
https://github.com/onnx/onnx-tensorrt/blob/984e57c7768a9bea3d2a8369ed199529f603d13b/builtin_op_importers.cpp#L2111
Thanks
Closing since no activities for more than 3 weeks, please reopen if you still have question. Thanks!
Is the issue resolved in 8.0?
Hello @lbortho , support for gather in loop is resolved in next release (around 2 months from now), not 8.0, thanks!
Hello @ttyio. I upgraded to TensorRT 8.2.0.6 and still have issue with the same simple model conversion
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
import tf2onnx
input= Input(shape = (60,8), dtype=tf.float32)
lstm = tf.keras.layers.LSTM(1)(input)
model = Model(inputs=input, outputs=lstm)
spec = (tf.TensorSpec((None, 60, 8), tf.float32),)
tf2onnx.convert.from_keras(model, output_path='model_dummy.onnx', input_signature=spec)
With the following depencencies Python 3.6.9 TensorRT 8.2.0.6 TensorFlow 2.4.0 tf2onnx 1.9.1
When I try converting from ONNX to TRT:
trtexec --optShapes='args_0':1x60x8 --onnx=model_dummy.onnx
The log shows
[11/09/2021-13:06:59] [I] [TRT] [MemUsageChange] Init CUDA: CPU +322, GPU +0, now: CPU 334, GPU 2009 (MiB)
[11/09/2021-13:06:59] [I] Start parsing network model
[11/09/2021-13:06:59] [I] [TRT] ----------------------------------------------------------------
[11/09/2021-13:06:59] [I] [TRT] Input filename: model_dummy.onnx
[11/09/2021-13:06:59] [I] [TRT] ONNX IR version: 0.0.6
[11/09/2021-13:06:59] [I] [TRT] Opset version: 11
[11/09/2021-13:06:59] [I] [TRT] Producer name: tf2onnx
[11/09/2021-13:06:59] [I] [TRT] Producer version: 1.9.1
[11/09/2021-13:06:59] [I] [TRT] Domain:
[11/09/2021-13:06:59] [I] [TRT] Model version: 0
[11/09/2021-13:06:59] [I] [TRT] Doc string:
[11/09/2021-13:06:59] [I] [TRT] ----------------------------------------------------------------
[11/09/2021-13:06:59] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/09/2021-13:06:59] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[11/09/2021-13:06:59] [I] Finish parsing network model
[11/09/2021-13:06:59] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 451 MiB, GPU 2029 MiB
[11/09/2021-13:07:00] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.1 but loaded cuBLAS/cuBLAS LT 11.5.1
[11/09/2021-13:07:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +483, GPU +206, now: CPU 934, GPU 2235 (MiB)
[11/09/2021-13:07:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +393, GPU +180, now: CPU 1327, GPU 2415 (MiB)
[11/09/2021-13:07:00] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/09/2021-13:07:00] [I] [TRT] [BlockAssignment] Algorithm Linear took 0.000402ms to assign 1 blocks to 1 nodes requiring 16777216 bytes.
[11/09/2021-13:07:00] [I] [TRT] Total Activation Memory: 16777216
[11/09/2021-13:07:00] [I] [TRT] Detected 1 inputs and 1 output network tensors.
trtexec: /root/gpgpu/MachineLearning/myelin/src/compiler/ir/operation.cpp:396: void myelin::ir::operation_t::replace_def(myelin::ir::tensor_t*, size_t): Assertion `idx < out_tensors().size()' failed.
Aborted (core dumped)
I have the same logs when I use opset 9 or 10.
Here is the ONNX model: model_dummy.onnx.tar.gz
thanks @lbortho for the detail repro , I have created internal bug to track this issue.
Hello @ttyio . I also tried with
lstm = tf.keras.layers.LSTM(2)(input)
instead of
lstm = tf.keras.layers.LSTM(1)(input)
The log shows a different error in this case
[11/16/2021-11:14:02] [I] TensorRT version: 8200
[11/16/2021-11:14:03] [I] [TRT] [MemUsageChange] Init CUDA: CPU +322, GPU +0, now: CPU 334, GPU 972 (MiB)
[11/16/2021-11:14:03] [I] Start parsing network model
[11/16/2021-11:14:03] [I] [TRT] ----------------------------------------------------------------
[11/16/2021-11:14:03] [I] [TRT] Input filename: model_dummy.onnx
[11/16/2021-11:14:03] [I] [TRT] ONNX IR version: 0.0.6
[11/16/2021-11:14:03] [I] [TRT] Opset version: 11
[11/16/2021-11:14:03] [I] [TRT] Producer name: tf2onnx
[11/16/2021-11:14:03] [I] [TRT] Producer version: 1.9.1
[11/16/2021-11:14:03] [I] [TRT] Domain:
[11/16/2021-11:14:03] [I] [TRT] Model version: 0
[11/16/2021-11:14:03] [I] [TRT] Doc string:
[11/16/2021-11:14:03] [I] [TRT] ----------------------------------------------------------------
[11/16/2021-11:14:03] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/16/2021-11:14:03] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[11/16/2021-11:14:03] [I] Finish parsing network model
[11/16/2021-11:14:03] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 451 MiB, GPU 1000 MiB
[11/16/2021-11:14:03] [E] Error[4]: [graphShapeAnalyzer.cpp::processCheck::582] Error Code 4: Internal Error (model/lstm/PartitionedCall/while_loop:7: tensor volume exceeds (2^31)-1, dimensions are [2147483647,1,2])
[11/16/2021-11:14:03] [E] Error[2]: [builder.cpp::buildSerializedNetwork::561] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[11/16/2021-11:14:03] [E] Engine could not be created from network
[11/16/2021-11:14:03] [E] Building engine failed
[11/16/2021-11:14:03] [E] Failed to create engine from model.
[11/16/2021-11:14:03] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8200] # ./trtexec --optShapes=args_0:1x60x8 --onnx=/home/louis/cfh/model_dummy.onnx
Segmentation fault (core dumped)
Here is the ONNX model with LSTM(2): model_dummy_lstm_2.onnx.tar.gz
@lbortho met similar issue, have you solved this?
@joan126 No, still waiting for the bug resolution from NVIDIA/TensorRT.
This will be fixed in 8.4GA, thanks!
Hi @ttyio, I am having this same issue with a single LSTM layer.
I am running on a Jetson Xavier NX, Jetpack 4.6, TensorRT 8.0.1.6, Tensorflow 2.5.0.
Can you please confirm if TensoRT 8.4 can convert LSTM layers without major issues?
This is crucial for me since if the above holds, I will have to migrate my whole operations to the still-beta Jetpack 5.0.1 SDK.
@agrija9 sorry for the delay response, yes we have many fixes in 8.4, have you take a try with Jetpack 5.0? thanks!
Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!
I am still having issues with this LSTM layers converting to TensorRT. I am using 8.6 version. Is this fixed in any version of TRT?
Description
I wrote in Keras a custom model that takes an RGB-Video (i.e. a 4D Tensor) as input to classifiy it.
I can successfully convert it to ONNX, however conversion to TensorRT fails with
[02/23/2021-19:13:53] [E] [TRT] ../builder/myelin/codeGenerator.cpp (114) - Myelin Error in addNodeToMyelinGraph: 0 (while/TensorArrayV2Read/TensorListGetItem{StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1/Const:0,const_fold_opt__733,__inference_while_cond_45765_532_while/Less,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/while/maximum_iterations:0,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/time:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_43:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_44:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_45:0,while/add_2/y:0,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_9/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_8/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_7/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_6/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_5/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_4/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_3/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_2/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_1/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/concatenate/concat,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/transpose,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1/Const:0_0 + (Unnamed Layer* 531) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1,(Unnamed Layer* 541) [TripLimit],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/while_loop,(Unnamed Layer* 554) [Recurrence],(Unnamed Layer* 556) [Recurrence],(Unnamed Layer* 558) [Recurrence],(Unnamed Layer* 546) [TripLimit],while/add_2,(Unnamed Layer* 565) [Shuffle],while/TensorArrayV2Read/TensorListGetItem,while/MatMul,(Unnamed Layer* 549) [Recurrence],(Unnamed Layer* 550) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/TensorArrayV2Write/TensorListSetItem,(Unnamed Layer* 596) [LoopOutput],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/strided_slice_2,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/strided_slice_2__676 + StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/flatten/Reshape + (Unnamed Layer* 633) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/MatMul,(Unnamed Layer* 638) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/BiasAdd/ReadVariableOp:0 + (Unnamed Layer* 640) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/BiasAdd,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/re_lu_6/Relu,(Unnamed Layer* 649) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/MatMul,(Unnamed Layer* 654) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/BiasAdd/ReadVariableOp:0 + (Unnamed Layer* 656) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/BiasAdd,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/Softmax} operation not supported within a loop body.) [02/23/2021-19:13:53] [E] [TRT] ../builder/myelin/codeGenerator.cpp (114) - Myelin Error in addNodeToMyelinGraph: 0 () [02/23/2021-19:13:53] [E] Engine creation failed [02/23/2021-19:13:53] [E] Engine set up failed
Any idea whats going on?
Btw, I highly suspect that my problem is related to: https://github.com/NVIDIA/TensorRT/issues/411
Environment
TensorRT Version: 7.2.1.4 NVIDIA GPU: RTX 2080 NVIDIA Driver Version: 455.23.05 CUDA Version: 11.1 CUDNN Version: Operating System: Ubuntu 18.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:20.10-py3
Relevant Files
Steps To Reproduce