Could not find any implementation for node Error while converting the instructor-large model from ONNX to TensorRT engine

bingo-ctrl commented 1 year ago

Description

We tried to convert the instructor-large onnx model to support TensorRT but we got errors:

[07/24/2023-03:37:25] [V] [TRT] --------------- Timing Runner: {ForeignNode[0.auto_model.shared.weight.../3/Div]} (Myelin[0x80000023]) [07/24/2023-03:38:05] [V] [TRT] Skipping tactic 0x0000000000000000 due to exception Internal bug. Please report with reproduction steps. [07/24/2023-03:38:05] [V] [TRT] {ForeignNode[0.auto_model.shared.weight.../3/Div]} (Myelin[0x80000023]) profiling completed in 40.0429 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf [07/24/2023-03:38:05] [E] Error[10]: Could not find any implementation for node {ForeignNode[0.auto_model.shared.weight.../3/Div]}. [07/24/2023-03:38:05] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[0.auto_model.shared.weight.../3/Div]}.) [07/24/2023-03:38:05] [E] Engine could not be created from network [07/24/2023-03:38:05] [E] Building engine failed [07/24/2023-03:38:05] [E] Failed to create engine from model or file. [07/24/2023-03:38:05] [E] Engine set up failed

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU:T4

NVIDIA Driver Version: 535.54.03

CUDA Version: 12.1

CUDNN Version:8.9.2

Operating System:Unbuntu 22.04

Python Version (if applicable): Python 3.10

Baremetal or Container (if so, version):Container Version of TensoRT (23.06) docker image

Relevant Files

Model link: https://github.com/bingo-ctrl/onnx.git

Steps To Reproduce

Commands or scripts:

docker run -it --gpus all -v /path/to/instructor-large.onnx:/trt_optimize nvcr.io/nvidia/tensorrt:23.06-py3

trtexec --onnx=/trt_optimize/instructor-large.onnx --saveEngine=model.plan --explicitBatch --verbose \ --minShapes='input_ids.1':1x32,'attention_mask.1':1x32,'context_masks.1':1 \ --optShapes='input_ids.1':1x64,'attention_mask.1':1x64,'context_masks.1':1 \ --maxShapes='input_ids.1':256x1024,'attention_mask.1':256x1024,'context_masks.1':256

zerollzeng commented 1 year ago

Thanks for reporting this, I've filed internal bug 4209602 for this.

zerollzeng commented 1 year ago

Hi, the maxShapes is too large to fit in your GPU, could you please try a smaller masShapes? e.g. below shapes works for me.

trtexec --onnx=instructor-large.onnx --saveEngine=model.plan --explicitBatch --verbose --minShapes=input_ids.1:1x32,attention_mask.1:1x32,context_masks.1:1 --optShapes=input_ids.1:1x64,attention_mask.1:1x64,context_masks.1:1 --maxShapes=input_ids.1:128x256,attention_mask.1:128x256,context_masks.1:128

bingo-ctrl commented 1 year ago

Hi @zerollzeng , thank you for your prompt reply. It shows me the same error, when we run your commands even with smaller maxShapes:16x128. BTW, my host machine instance type is AWS g4dn.xlarge.

[07/27/2023-07:53:13] [V] [TRT] --------------- Timing Runner: [trainStation2] (TrainStation[0x80000032]) [07/27/2023-07:53:13] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.000239877 [07/27/2023-07:53:13] [V] [TRT] [trainStation2] (TrainStation[0x80000032]) profiling completed in 0.0134766 seconds. Fastest Tactic: 0x0000000000000000 Time: 0.000239877 [07/27/2023-07:53:13] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: TrainStation Tactic: 0x0000000000000000 [07/27/2023-07:53:13] [V] [TRT] =============== Computing costs for {ForeignNode[0.auto_model.shared.weight.../3/Div]} [07/27/2023-07:53:13] [V] [TRT] *** Autotuning format combination: Int32("input_ids.1_dim_1",1), Int32("input_ids.1_dim_1",1) -> Int32("input_ids.1_dim_1",1), Float((* 1024 "input_ids.1_dim_1"),1024,1), Float(768,1) *** [07/27/2023-07:53:13] [V] [TRT] --------------- Timing Runner: {ForeignNode[0.auto_model.shared.weight.../3/Div]} (Myelin[0x80000023]) [07/27/2023-07:54:09] [V] [TRT] (foreignNode) Set user's cuda kernel library [07/27/2023-07:54:10] [V] [TRT] Skipping tactic 0x0000000000000000 due to exception No Myelin Error exists [07/27/2023-07:54:10] [V] [TRT] {ForeignNode[0.auto_model.shared.weight.../3/Div]} (Myelin[0x80000023]) profiling completed in 56.9308 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf [07/27/2023-07:54:10] [E] Error[10]: Could not find any implementation for node {ForeignNode[0.auto_model.shared.weight.../3/Div]}. [07/27/2023-07:54:10] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[0.auto_model.shared.weight.../3/Div]}.) [07/27/2023-07:54:10] [E] Engine could not be created from network [07/27/2023-07:54:10] [E] Building engine failed [07/27/2023-07:54:10] [E] Failed to create engine from model or file. [07/27/2023-07:54:10] [E] Engine set up failed

zerollzeng commented 1 year ago

Oh Sorry I test in our latest code, this issue has been fixed in the next release.

XiaokunDing commented 1 year ago

The error message I received in my task is similar, I'm not sure if it's caused by the same issue.
docker images nvcr.io/nvidia/pytorch 23.07-py3

[07/30/2023-15:19:30] [TRT] [V] --------------- Timing Runner: {ForeignNode[/encoder/layers.0/self_attention/core_attention/Cast_2...Concat_534]} (Myelin[0x80000023]) [07/30/2023-15:19:34] [TRT] [V] (foreignNode) Set user's cuda kernel library [07/30/2023-15:19:34] [TRT] [V] (foreignNode) Pass fuse_conv_padding is currently skipped for dynamic shapes [07/30/2023-15:19:34] [TRT] [V] (foreignNode) Pass pad_conv_channel is currently skipped for dynamic shapes [07/30/2023-15:19:34] [TRT] [V] (foreignNode) Padding large gemms [07/30/2023-15:19:35] [TRT] [V] Skipping tactic 0x0000000000000000 due to exception Incompatible effective shapes in op /encoder/layers_0/self_attention/core_attention/Where(t_pw:select) between /encoder/layers_0/self_attention/core_attention/Cast_2_output_0'-(b[1,1,4096,8192][]so[]md[-1,-1,6,7], mem_prop=0) and /encoder/layers_0/self_attention/core_attention/Div_output_0'-(f16[1,32,4096,6436][]so[]md[-1,-1,6,8], mem_prop=0). [07/30/2023-15:19:35] [TRT] [V] {ForeignNode[/encoder/layers.0/self_attention/core_attention/Cast_2...Concat_534]} (Myelin[0x80000023]) profiling completed in 5.47154 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf [07/30/2023-15:19:35] [TRT] [V] Deleting timing cache: 18 entries, served 18 hits since creation. [07/30/2023-15:19:35] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[/encoder/layers.0/self_attention/core_attention/Cast_2...Concat_534]}. [07/30/2023-15:19:36] [TRT] [E] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/encoder/layers.0/self_attention/core_attention/Cast_2...Concat_534]}.)

XiaokunDing commented 1 year ago

When converting the ChatGML2-6b model to TensorRT, the conversion works fine for the 2-layer model, but it throws the aforementioned error for the 28-layer model.

2-layer [08/07/2023-02:11:01] [TRT] [W] - 19 weights are affected by this issue: Detected subnormal FP16 values. [08/07/2023-02:11:01] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +2048, now: CPU 0, GPU 2048 (MiB) Succeeded building plan/TranAllInOne.plan in 299 s [08/07/2023-02:11:21] [TRT] [I] Loaded engine size: 1800 MiB [08/07/2023-02:11:22] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1798, now: CPU 0, GPU 1798 (MiB) TranAllInOne.plan: bFP16=True [ 0]Input -> DataType.INT32 (1, -1) input_ids [ 1]Input -> DataType.INT32 (1, -1) position_ids [ 2]Input -> DataType.BOOL (1, 1, -1, -1) attention_mask [ 3]Input -> DataType.HALF (-1, 1, 2, 128) AdjustInputOutput-V-0-tensorInputKV [ 4]Output-> DataType.HALF (-1, 1, 2, 128) AdjustInputOutput-V-6-Concat-output_past_kv [ 5]Output-> DataType.INT32 (1, 1) AddTail-V-9-Transpose

mon28 commented 1 year ago

@XiaokunDing did you figure out how to convert the instructor-large onnx model to support TensorRT and how TensorRT will work on this?

NVIDIA / TensorRT