NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.68k stars 2.12k forks source link

Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::Gather_543...Transpose_3934 + Reshape_3943]}.) #2647

Closed david-PHR closed 1 year ago

david-PHR commented 1 year ago

Description

I use trtexec (TensorRT 8.5.0 from the nvidia NGC pytorch 22.09-py3 container) to convert an ONNX model to trt engine.

There is an issue when I compile the ONNX model. After several minute on a A100-40 GB GPU it show the error below : [02/07/2023-11:12:18] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature. [02/07/2023-11:13:44] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Autotuner: no tactics to implement operation [02/07/2023-11:13:44] [E] Error[10]: [optimizer.cpp::computeCosts::3712] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::Gather_543...Transpose_3934 + Reshape_3943]}.) [02/07/2023-11:13:44] [E] Error[2]: [builder.cpp::buildSerializedNetwork::738] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) Here is the command that I've used:

trtexec --onnx=/home/my_model.onnx --saveEngine=/home/my_model.onnx_trt --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --workspace=40000 --minShapes=input_0:1x3x896x896 --optShapes=input_0:1x3x896x896 --maxShapes=input_0:1x3x896x896 --preview=+fasterDynamicShapes0805

I've tried with smaller workspace size, with and without the fasterDynamicShapes0805 preview features, and with/without dynamic inputs size. Do you have any idea on how to solve this issue?

david-PHR commented 1 year ago

I can now compile the model when updating to TensorRT 8.5.2 for a higher resolution than 512x512. Unfortunately, I still cannot support dynamic input shape. Let's say that I want to compile the model for 512 to 896 width and height range, this issue appears:

[02/07/2023-12:23:38] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature. [02/07/2023-12:24:38] [W] [TRT] Skipping tactic 0x0000000000000000 due to exception Autotuner: no tactics to implement operation [02/07/2023-12:24:38] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::Gather_543...Transpose_3934 + Reshape_3943]}.) [02/07/2023-12:24:38] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

zerollzeng commented 1 year ago

Can you try our latest container first? if it still fails could you please share the onnx with us? We can do further debug. Thanks!

david-PHR commented 1 year ago

I've tried the latest container, which works with static input shapes. Unfortunately, it still doesn't work when input shapes are dynamic. I've identified the part of the graph that failed. Here is the ONNX file of the subgraph that fail: subgraph_fail.onnx.zip

Here is the command that failed: trtexec --onnx=subgraph_fail.onnx --saveEngine=subgraph_fail.onnx_trt --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --workspace=16384 --preview=+fasterDynamicShapes0805 --minShapes=input_0:1x3x384x384 --optShapes=input_0:1x3x512x512 --maxShapes=input_0:1x3x896x896

david-PHR commented 1 year ago

Hello, do you have any news on this error?

zerollzeng commented 1 year ago

I can reproduce the error and I've filed internal bug 3979250 for this. Will update here when we have progress on this.

oxana-nvidia commented 1 year ago

bug is fixed on TensorRT 8.6. Please verify when it is publicly available.

&&&& PASSED TensorRT.trtexec [TensorRT v8600] # trtexec --onnx=subgraph_fail.onnx --saveEngine=subgraph_fail.onnx_trt --preview=+fasterDynamicShapes0805 --minShapes=input_0:1x3x384x384 --optShapes=input_0:1x3x512x512 --maxShapes=input_0:1x3x896x896 --verbose
zerollzeng commented 1 year ago

Closed. feel free to reopen if you have any further questions. Thanks!

FrankyTang commented 5 months ago

I have the same error in TensorRT8601.I tried to interrupt this Myelin, but I'm not sure which operators were fused due to the insufficient information provided by myelin,how can I debug 。 [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/agent_encoder/Expand_2.../Slice_3]}.)

oxana-nvidia commented 5 months ago

Please try TensorRT 10.0