NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.68k stars 2.12k forks source link

ONNX to TRT Error: Myelin Error in addNodeToMyelinGraph: 0 ... operation not supported within a loop body. #1081

Closed wizpig closed 1 year ago

wizpig commented 3 years ago

Description

I wrote in Keras a custom model that takes an RGB-Video (i.e. a 4D Tensor) as input to classifiy it.

list_convolved_frames = []
input = tf.keras.Input(shape=(num_frames,*input_shape_frame))
for i in range(num_frames):
            out = input[:,i,:,:,:] 
            out = do_something(out)
            out = Lambda(lambda x: tf.keras.backend.expand_dims(x,1))(out)
            list_convolved_frames.append(out)

convolved_frames = Concatenate(axis=1)(list_convolved_frames)

out = LSTM(64,return_sequences=False,dropout=dropout_rate)(convolved_frames)
out = Flatten()(out)
out   = Dense(2, activation='softmax')(out)

model = tf.keras.Model(inputs=input, outputs=out, name=model_name)

I can successfully convert it to ONNX, however conversion to TensorRT fails with

[02/23/2021-19:13:53] [E] [TRT] ../builder/myelin/codeGenerator.cpp (114) - Myelin Error in addNodeToMyelinGraph: 0 (while/TensorArrayV2Read/TensorListGetItem{StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1/Const:0,const_fold_opt__733,__inference_while_cond_45765_532_while/Less,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/while/maximum_iterations:0,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/time:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_43:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_44:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_45:0,while/add_2/y:0,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_9/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_8/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_7/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_6/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_5/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_4/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_3/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_2/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_1/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/concatenate/concat,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/transpose,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1/Const:0_0 + (Unnamed Layer* 531) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1,(Unnamed Layer* 541) [TripLimit],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/while_loop,(Unnamed Layer* 554) [Recurrence],(Unnamed Layer* 556) [Recurrence],(Unnamed Layer* 558) [Recurrence],(Unnamed Layer* 546) [TripLimit],while/add_2,(Unnamed Layer* 565) [Shuffle],while/TensorArrayV2Read/TensorListGetItem,while/MatMul,(Unnamed Layer* 549) [Recurrence],(Unnamed Layer* 550) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/TensorArrayV2Write/TensorListSetItem,(Unnamed Layer* 596) [LoopOutput],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/strided_slice_2,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/strided_slice_2__676 + StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/flatten/Reshape + (Unnamed Layer* 633) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/MatMul,(Unnamed Layer* 638) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/BiasAdd/ReadVariableOp:0 + (Unnamed Layer* 640) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/BiasAdd,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/re_lu_6/Relu,(Unnamed Layer* 649) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/MatMul,(Unnamed Layer* 654) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/BiasAdd/ReadVariableOp:0 + (Unnamed Layer* 656) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/BiasAdd,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/Softmax} operation not supported within a loop body.) [02/23/2021-19:13:53] [E] [TRT] ../builder/myelin/codeGenerator.cpp (114) - Myelin Error in addNodeToMyelinGraph: 0 () [02/23/2021-19:13:53] [E] Engine creation failed [02/23/2021-19:13:53] [E] Engine set up failed

Any idea whats going on?

Btw, I highly suspect that my problem is related to: https://github.com/NVIDIA/TensorRT/issues/411

Environment

TensorRT Version: 7.2.1.4 NVIDIA GPU: RTX 2080 NVIDIA Driver Version: 455.23.05 CUDA Version: 11.1 CUDNN Version: Operating System: Ubuntu 18.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:20.10-py3

Relevant Files

Steps To Reproduce

wizpig commented 3 years ago

I've found out that removing the LSTM block from the network allowed me to convert to TensorRT! Question is now, what goes wrong with LSTM during TensorRT conversion? To my understanding it is supported. Right?

This issue may have to do with a warning, I've got while saving my model during train time in Keras:

WARNING:absl:Found untraced functions such as lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses, lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses, lstm_cell_layer_call_and_return_conditional_losses while saving (showing 5 of 5). 
These functions will not be directly callable after loading.
wizpig commented 3 years ago

I've also tried to convert my model via TF-TRT and that does work even though I'm using an LSTM layer... However TF-TRT has some significant drawbacks, as I'm aiming for a Jetson Xavier as target platform.

Does anyone know about a working example where a Keras/tensorflow model with LSTM layers is converted to TensorRT?

ttyio commented 3 years ago

Hello @wizpig , could you provide an onnx file that we can debug? thanks!

lbortho commented 3 years ago

@ttyio I have the same issue: converting LSTM+Dense TF to TRT triggers a "Myelin Error in addNodeToMyelinGraph: operation not supported within a loop body". Does the "triaged" status means that a fix is planned? Thanks.

ttyio commented 3 years ago

Hello @lbortho , tensorrt has different coverage for op inside a loop and outside a loop. Could you provide a repro that we can use to debug which op is missing when inside a loop? also I used triaged tag to track the issues that I have seen, still need your repro for further triage, thanks!

wizpig commented 3 years ago

Hello @wizpig , could you provide an onnx file that we can debug? thanks!

Hey ttyo,

sorry for the late reply. Here you have an example ONNX file that produces the same error for me.

20210428_163309_EXAMPLE_MODEL_CLASS_LSTM_skipframe_0_batch_0002.onnx.zip

lbortho commented 3 years ago

Hi @ttyio , the steps to reproduce the issue is simple. First here is my dependencies. JetPack 4.5.1 Python 3.6.9 TensorRT 7.1.3 TensorFlow 2.4.0 tf2onnx 1.8.4

Then the code

import tensorflow as tf
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
import tf2onnx

input= Input(shape = (60,8), dtype=tf.float32)

lstm = tf.keras.layers.LSTM(1)(input)
model = Model(inputs=input, outputs=lstm)
spec = (tf.TensorSpec((None, 60, 8), tf.float32),)

tf2onnx.convert.from_keras(model, output_path='model_dummy.onnx', input_signature=spec)`

I use trtexec from TensorRT OSS 7.1.3 to parse/verify the onnx: trtexec --optShapes='args_0':1x60x8 --onnx=model_dummy.onnx

Which results in

[04/28/2021-14:06:45] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-14:06:45] [I] [TRT] 
[04/28/2021-14:06:45] [I] [TRT] --------------- Layers running on DLA: 
[04/28/2021-14:06:45] [I] [TRT] 
[04/28/2021-14:06:45] [I] [TRT] --------------- Layers running on GPU: 
[04/28/2021-14:06:45] [I] [TRT] {(Unnamed Layer* 0) [Constant],(Unnamed Layer* 1) [Constant],while_cond_567_while/Less,(Unnamed Layer* 14) [Constant],(Unnamed Layer* 17) [Constant],(Unnamed Layer* 23) [Constant],(Unnamed Layer* 25) [Constant],(Unnamed Layer* 27) [Constant],(Unnamed Layer* 29) [Constant],model/lstm/PartitionedCall/transpose,(Unnamed Layer* 9) [Constant] + (Unnamed Layer* 10) [Shuffle],model/lstm/zeros_1,(Unnamed Layer* 15) [TripLimit],model/lstm/PartitionedCall/while_loop,(Unnamed Layer* 24) [Recurrence],(Unnamed Layer* 26) [Recurrence],(Unnamed Layer* 28) [Recurrence],(Unnamed Layer* 16) [TripLimit],while/add_2,while/TensorArrayV2Read/TensorListGetItem,(Unnamed Layer* 35) [Shuffle],while/MatMul,(Unnamed Layer* 19) [Recurrence],(Unnamed Layer* 20) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/Identity_4,(Unnamed Layer* 66) [LoopOutput],model/lstm/PartitionedCall/strided_slice_2,model/lstm/PartitionedCall/strided_slice_2__23}, 
[04/28/2021-14:06:46] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/28/2021-14:06:46] [E] [TRT] ../builder/myelin/codeGenerator.cpp (112) - Myelin Error in addNodeToMyelinGraph: 0 (while/TensorArrayV2Read/TensorListGetItem{(Unnamed Layer* 0) [Constant],(Unnamed Layer* 1) [Constant],while_cond_567_while/Less,(Unnamed Layer* 14) [Constant],(Unnamed Layer* 17) [Constant],(Unnamed Layer* 23) [Constant],(Unnamed Layer* 25) [Constant],(Unnamed Layer* 27) [Constant],(Unnamed Layer* 29) [Constant],model/lstm/PartitionedCall/transpose,(Unnamed Layer* 9) [Constant] + (Unnamed Layer* 10) [Shuffle],model/lstm/zeros_1,(Unnamed Layer* 15) [TripLimit],model/lstm/PartitionedCall/while_loop,(Unnamed Layer* 24) [Recurrence],(Unnamed Layer* 26) [Recurrence],(Unnamed Layer* 28) [Recurrence],(Unnamed Layer* 16) [TripLimit],while/add_2,while/TensorArrayV2Read/TensorListGetItem,(Unnamed Layer* 35) [Shuffle],while/MatMul,(Unnamed Layer* 19) [Recurrence],(Unnamed Layer* 20) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/Identity_4,(Unnamed Layer* 66) [LoopOutput],model/lstm/PartitionedCall/strided_slice_2,model/lstm/PartitionedCall/strided_slice_2__23} operation not supported within a loop body.)
[04/28/2021-14:06:46] [E] [TRT] ../builder/myelin/codeGenerator.cpp (112) - Myelin Error in addNodeToMyelinGraph: 0 ()
[04/28/2021-14:06:46] [E] Engine creation failed
[04/28/2021-14:06:46] [E] Engine set up failed

Finally here is the ONNX model:

model_dummy.onnx.gz

lbortho commented 3 years ago

Hi @ttyio, I tried the same test with TensorRT 7.2.3.4 GA (CUDA 11.0 and Cudnn 8.1) and TensorFlow 2.4.1 on a x86 18.04 Ubuntu host and obtained the same error. Have you been able to reproduce the issue on your side?

ttyio commented 3 years ago

Hello @lbortho , we have some improvement in the 8.0, but still cannot work with your model because we do not support gather in the loop, have created internal request to track this, sorry.

wizpig commented 3 years ago

@ttyio Does that mean, that LSTMs are in general not supported at this point?

Nevertheless, thanks for your effort.

ttyio commented 3 years ago

Hello @wizpig , the ONNX lstm is supported, there is no gather in the loop, see the implementation here https://github.com/onnx/onnx-tensorrt/blob/984e57c7768a9bea3d2a8369ed199529f603d13b/builtin_op_importers.cpp#L2111 Thanks

ttyio commented 3 years ago

Closing since no activities for more than 3 weeks, please reopen if you still have question. Thanks!

lbortho commented 3 years ago

Is the issue resolved in 8.0?

ttyio commented 3 years ago

Hello @lbortho , support for gather in loop is resolved in next release (around 2 months from now), not 8.0, thanks!

lbortho commented 2 years ago

Hello @ttyio. I upgraded to TensorRT 8.2.0.6 and still have issue with the same simple model conversion

from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
import tf2onnx

input= Input(shape = (60,8), dtype=tf.float32)

lstm = tf.keras.layers.LSTM(1)(input)
model = Model(inputs=input, outputs=lstm)
spec = (tf.TensorSpec((None, 60, 8), tf.float32),)

tf2onnx.convert.from_keras(model, output_path='model_dummy.onnx', input_signature=spec)

With the following depencencies Python 3.6.9 TensorRT 8.2.0.6 TensorFlow 2.4.0 tf2onnx 1.9.1

When I try converting from ONNX to TRT: trtexec --optShapes='args_0':1x60x8 --onnx=model_dummy.onnx

The log shows

[11/09/2021-13:06:59] [I] [TRT] [MemUsageChange] Init CUDA: CPU +322, GPU +0, now: CPU 334, GPU 2009 (MiB)
[11/09/2021-13:06:59] [I] Start parsing network model
[11/09/2021-13:06:59] [I] [TRT] ----------------------------------------------------------------
[11/09/2021-13:06:59] [I] [TRT] Input filename:   model_dummy.onnx
[11/09/2021-13:06:59] [I] [TRT] ONNX IR version:  0.0.6
[11/09/2021-13:06:59] [I] [TRT] Opset version:    11
[11/09/2021-13:06:59] [I] [TRT] Producer name:    tf2onnx
[11/09/2021-13:06:59] [I] [TRT] Producer version: 1.9.1
[11/09/2021-13:06:59] [I] [TRT] Domain:           
[11/09/2021-13:06:59] [I] [TRT] Model version:    0
[11/09/2021-13:06:59] [I] [TRT] Doc string:       
[11/09/2021-13:06:59] [I] [TRT] ----------------------------------------------------------------
[11/09/2021-13:06:59] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/09/2021-13:06:59] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[11/09/2021-13:06:59] [I] Finish parsing network model
[11/09/2021-13:06:59] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 451 MiB, GPU 2029 MiB
[11/09/2021-13:07:00] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.1 but loaded cuBLAS/cuBLAS LT 11.5.1
[11/09/2021-13:07:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +483, GPU +206, now: CPU 934, GPU 2235 (MiB)
[11/09/2021-13:07:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +393, GPU +180, now: CPU 1327, GPU 2415 (MiB)
[11/09/2021-13:07:00] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/09/2021-13:07:00] [I] [TRT] [BlockAssignment] Algorithm Linear took 0.000402ms to assign 1 blocks to 1 nodes requiring 16777216 bytes.
[11/09/2021-13:07:00] [I] [TRT] Total Activation Memory: 16777216
[11/09/2021-13:07:00] [I] [TRT] Detected 1 inputs and 1 output network tensors.
trtexec: /root/gpgpu/MachineLearning/myelin/src/compiler/ir/operation.cpp:396: void myelin::ir::operation_t::replace_def(myelin::ir::tensor_t*, size_t): Assertion `idx < out_tensors().size()' failed.
Aborted (core dumped)

I have the same logs when I use opset 9 or 10.

Here is the ONNX model: model_dummy.onnx.tar.gz

ttyio commented 2 years ago

thanks @lbortho for the detail repro , I have created internal bug to track this issue.

lbortho commented 2 years ago

Hello @ttyio . I also tried with

lstm = tf.keras.layers.LSTM(2)(input)

instead of

lstm = tf.keras.layers.LSTM(1)(input)

The log shows a different error in this case

[11/16/2021-11:14:02] [I] TensorRT version: 8200
[11/16/2021-11:14:03] [I] [TRT] [MemUsageChange] Init CUDA: CPU +322, GPU +0, now: CPU 334, GPU 972 (MiB)
[11/16/2021-11:14:03] [I] Start parsing network model
[11/16/2021-11:14:03] [I] [TRT] ----------------------------------------------------------------
[11/16/2021-11:14:03] [I] [TRT] Input filename:   model_dummy.onnx
[11/16/2021-11:14:03] [I] [TRT] ONNX IR version:  0.0.6
[11/16/2021-11:14:03] [I] [TRT] Opset version:    11
[11/16/2021-11:14:03] [I] [TRT] Producer name:    tf2onnx
[11/16/2021-11:14:03] [I] [TRT] Producer version: 1.9.1
[11/16/2021-11:14:03] [I] [TRT] Domain:           
[11/16/2021-11:14:03] [I] [TRT] Model version:    0
[11/16/2021-11:14:03] [I] [TRT] Doc string:       
[11/16/2021-11:14:03] [I] [TRT] ----------------------------------------------------------------
[11/16/2021-11:14:03] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/16/2021-11:14:03] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[11/16/2021-11:14:03] [I] Finish parsing network model
[11/16/2021-11:14:03] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 451 MiB, GPU 1000 MiB
[11/16/2021-11:14:03] [E] Error[4]: [graphShapeAnalyzer.cpp::processCheck::582] Error Code 4: Internal Error (model/lstm/PartitionedCall/while_loop:7: tensor volume exceeds (2^31)-1, dimensions are [2147483647,1,2])
[11/16/2021-11:14:03] [E] Error[2]: [builder.cpp::buildSerializedNetwork::561] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[11/16/2021-11:14:03] [E] Engine could not be created from network
[11/16/2021-11:14:03] [E] Building engine failed
[11/16/2021-11:14:03] [E] Failed to create engine from model.
[11/16/2021-11:14:03] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8200] # ./trtexec --optShapes=args_0:1x60x8 --onnx=/home/louis/cfh/model_dummy.onnx
Segmentation fault (core dumped)

Here is the ONNX model with LSTM(2): model_dummy_lstm_2.onnx.tar.gz

joan126 commented 2 years ago

@lbortho met similar issue, have you solved this?

lbortho commented 2 years ago

@joan126 No, still waiting for the bug resolution from NVIDIA/TensorRT.

ttyio commented 2 years ago

This will be fixed in 8.4GA, thanks!

agrija9 commented 2 years ago

Hi @ttyio, I am having this same issue with a single LSTM layer.

I am running on a Jetson Xavier NX, Jetpack 4.6, TensorRT 8.0.1.6, Tensorflow 2.5.0.

Can you please confirm if TensoRT 8.4 can convert LSTM layers without major issues?

This is crucial for me since if the above holds, I will have to migrate my whole operations to the still-beta Jetpack 5.0.1 SDK.

ttyio commented 1 year ago

@agrija9 sorry for the delay response, yes we have many fixes in 8.4, have you take a try with Jetpack 5.0? thanks!

ttyio commented 1 year ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

venkataganti commented 5 months ago

I am still having issues with this LSTM layers converting to TensorRT. I am using 8.6 version. Is this fixed in any version of TRT?