NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.62k stars 2.11k forks source link

Error Code 4: Miscellaneous (IShuffleLayer Reshape_427: reshape changes volume. Reshaping [900,1,256] to [900,7200,32].) #2245

Closed liangguixing95 closed 1 year ago

liangguixing95 commented 2 years ago

hello, when i coverted my onnx model to TensorRT by the command, ./trtexec --onnx=model.onnx --saveEngine=model.engine i got big diff between pytorch result and trt result. i located the problem which might be related to the decoder transformer part of my model. so i only coverted the transformer part to onnx and try to find out what is wrong. but when i run the command ./trtexec --onnx=decoder_transformer.onnx --saveEngine=decoder_transformer.engineto covert onnx to trt. i got an error which didn't appear during the "model.onnx" converting.

error

The error comes from the cross attention part. but the error disappears when i only covert the cross attention module to onnx and trt by ./trtexec --onnx=cross_attention.onnx --saveEngine=cross_attention.engine. so finally i can not figure out how to solve the problem to get correct trt result and open a issue for some help. Thanks~

Environment TensorRT Version: 8.4.1.5+cuda11.6 NVIDIA GPU: A100 NVIDIA Driver Version: 510.47.03 CUDA Version: 11.6 CUDNN Version: 8.4.0.27 Operating System: Ubuntu 20.04.2 LTS Python Version: 3.7.13 PyTorch Version: 1.10

zerollzeng commented 2 years ago

Usually, this happened when your model has a dynamic input shape and a fixed reshape operation, can you check it first?

frankvp11 commented 2 years ago

I got this same error. What do you want me to check? @zerollzeng Edit: I am training using the balloon example (idk where the link was anymore) and used their dataset and configurations.

zerollzeng commented 2 years ago

Check the onnx model first, e.g. run it with onnx runtime with a preset input shapes.

zerollzeng commented 2 years ago

the problem here is simple, support you have a reshape layer, reshape a tensor to 2x6, it's has an input of axb, then axb must equal to 2x6=12

frankvp11 commented 2 years ago

Yeah- I made another issue explaining my issue more closely, but I knew what you meant before already. Ill check it later with onnxruntime

liangguixing95 commented 2 years ago

I've found out the reason which is related to the layer norm. In my model, the input of LN is a tensor of [900,1,256], the LN function is called by nn.functional.layer_norm(input, [256,]) , the output in the pytorch version has no problem but get a wrong output shape of [900,900,256] for onnx. I fixed the problem by revise the method into nn.functional.layer_norm(input, [1, 256]) . you can check if your code get the same problem @frankvp11

liangguixing95 commented 2 years ago

I've fixed the shape error but got another new problem. the outputs of onnx and trtfp32 engine are quite different after the torch.bmm operator in cross attention module.

bmm

I compare the output of q,k,attn of onnx and trt and print the max diff of each pair. q,k of them are the same, but attn are quite different. as show below. I have no idea to solve this. @zerollzeng

diff2
frankvp11 commented 2 years ago

I'm working with Detectron2 so its impossible for me to realistically edit the source code.

zerollzeng commented 2 years ago

I compare the output of q,k,attn of onnx and trt and print the max diff of each pair. q,k of them are the same, but attn are quite different. as show below. I have no idea to solve this

Can you provide a reproduce so that I can check it on my side? I would prefer a minimal onnx model.

liangguixing95 commented 2 years ago

https://drive.google.com/drive/folders/13LGb4uCEzrLV4k1dRa9FBHPnrrAwXfSf?usp=sharing Hear are the onnx model and some debug inputs i used to preduce the diff comparison log.

zerollzeng commented 2 years ago

I can't reproduce it using polygraphy, all output is matched:

[I] Accuracy Comparison | trt-runner-N0-08/22/22-15:50:44 vs. onnxrt-runner-N0-08/22/22-15:50:44
[I]     Comparing Output: '72' (dtype=float32, shape=(8, 900, 32)) with '72' (dtype=float32, shape=(8, 900, 32))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-08/22/22-15:50:44: 72 | Stats: mean=-0.0027745, std-dev=0.1346, var=0.018118, median=-7.5492e-05, min=-0.53595 at (2, 16, 0), max=0.58039 at (2, 300, 21), avg-magnitude=0.10865
[I]         onnxrt-runner-N0-08/22/22-15:50:44: 72 | Stats: mean=-0.0027745, std-dev=0.1346, var=0.018118, median=-7.5492e-05, min=-0.53595 at (2, 16, 0), max=0.58039 at (2, 300, 21), avg-magnitude=0.10865
[I]         Error Metrics: 72
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]         PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     Comparing Output: '73' (dtype=float32, shape=(8, 12000, 32)) with '73' (dtype=float32, shape=(8, 12000, 32))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-08/22/22-15:50:44: 73 | Stats: mean=0.062328, std-dev=0.72619, var=0.52735, median=0.055339, min=-3.2914 at (3, 5027, 19), max=3.1621 at (1, 3771, 3), avg-magnitude=0.5761
[I]         onnxrt-runner-N0-08/22/22-15:50:44: 73 | Stats: mean=0.062328, std-dev=0.72619, var=0.52735, median=0.055339, min=-3.2914 at (3, 5027, 19), max=3.1621 at (1, 3771, 3), avg-magnitude=0.5761
[I]         Error Metrics: 73
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]         PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     Comparing Output: '76' (dtype=float32, shape=(8, 900, 12000)) with '76' (dtype=float32, shape=(8, 900, 12000))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-08/22/22-15:50:44: 76 | Stats: mean=-0.24013, std-dev=0.44643, var=0.1993, median=-0.23786, min=-3.2709 at (2, 191, 11177), max=2.4214 at (1, 174, 3771), avg-magnitude=0.40642
[I]         onnxrt-runner-N0-08/22/22-15:50:44: 76 | Stats: mean=-0.24013, std-dev=0.44643, var=0.1993, median=-0.23786, min=-3.2709 at (2, 191, 11177), max=2.4214 at (1, 174, 3771), avg-magnitude=0.40642
[I]         Error Metrics: 76
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]         PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['72', '73', '76']
[I] PASSED | Command: /usr/local/bin/polygraphy run module.onnx --trt --onnxrt
zerollzeng commented 2 years ago

A suggestion: after constant folding, the network structure is simpler: image

polygraphy surgeon sanitize module.onnx --fold-constants -o module_folded.onnx
frankvp11 commented 2 years ago

@zerollzeng does constant folding make the model better/faster?

liangguixing95 commented 2 years ago

@zerollzeng does constant folding make the model better/faster? Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by polygraphy shows there may be no problem. But when using the real data, the max diff of the outputs are quite large as the log show above.

zerollzeng commented 2 years ago

Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by polygraphy shows there may be no problem. But when using the real data, the max diff of the outputs are quite large as the log show above.

Are you using the real data for input? it might be caused by your input data, e.g. if you feed random binary data to it, it might be large value like e+6

ttyio commented 1 year ago

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

fanchuanster commented 1 year ago

Use NGC pytorch:22.12-py3 instead of pytorch:22.07-py3 to fix “Error Code 4: Miscellaneous (IShuffleLayer Reshape_179: reshape changes volume. Reshaping [784] to [1])"

lix19937 commented 5 months ago

I also come across this problem

[05/11/2024-15:07:32] [V] [TRT] Insert CopyNode after ConstantNode that produces a Myelin graph output: 25021
[05/11/2024-15:07:33] [E] Error[4]: [shapeCompiler.cpp::evaluateShapeChecks::1180] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: IShuffleLayer Reshape_1933: reshaping failed for tensor: 3516 Reshape would change volume.)
[05/11/2024-15:07:33] [E] Error[2]: [builder.cpp::buildSerializedNetwork::743] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[05/11/2024-15:07:33] [E] Engine could not be created from network
[05/11/2024-15:07:33] [E] Building engine failed
[05/11/2024-15:07:33] [E] Failed to create engine from model or file.
[05/11/2024-15:07:33] [E] Engine set up failed

the onnx's input all are fixed shape, but inner network has data-dependent op like nonzero, if I replace all code related to data-dependent operations with plugins for implementation, the errors will not occur.