NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.2k stars 2.08k forks source link

trt-engine-explorer failure of TensorRT 8.6 when running EnginePlan(f'{PATH}/graph.json', f'{PATH}/profile.json') #3585

Open yanqzhao opened 6 months ago

yanqzhao commented 6 months ago

Description

what's the "LayerType": "TrainStation" is ?

when i use the [trt-engine-explorer], I get the following error:

plan = EnginePlan(f'{PATH}/graph.json', f'{PATH}/profile.json') Traceback (most recent call last): File "", line 1, in File "/home/disk_1/zyq/work/TensorRT/tools/experimental/trt-engine-explorer/trex/engine_plan.py", line 127, in init graph_df = construct_df(raw_layers) File "/home/disk_1/zyq/work/TensorRT/tools/experimental/trt-engine-explorer/trex/engine_plan.py", line 106, in construct_df graph_df = fix_df(graph_df) File "/home/disk_1/zyq/work/TensorRT/tools/experimental/trt-engine-explorer/trex/df_preprocessing.py", line 176, in fix_df fix_output_precision(df) File "/home/disk_1/zyq/work/TensorRT/tools/experimental/trt-engine-explorer/trex/df_preprocessing.py", line 164, in fix_output_precision df['output_precision'] = [Activation(outputs[1]).precision for outputs in df['Outputs']] File "/home/disk_1/zyq/work/TensorRT/tools/experimental/trt-engine-explorer/trex/df_preprocessing.py", line 164, in df['output_precision'] = [Activation(outputs[1]).precision for outputs in df['Outputs']] IndexError: list index out of range

I think this may be caused by the TrainStation layer, since in my graph.json,there is a layer like this:

"Layers": [{ "Name": "[trainStation1]", "LayerType": "TrainStation", "Inputs": [], "Outputs": [], "TacticValue": "0x0000000000000000", "StreamId": 0, "Metadata": "" }

Environment

TensorRT Version: 8.6

NVIDIA GPU: 3090

NVIDIA Driver Version: 525.105.17

CUDA Version: 11.6

CUDNN Version: 8.9.2.26_cuda11

Operating System: ubuntu 18.04

Python Version (if applicable): 3.9.18

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

nzmora-nvidia commented 5 months ago

@yanqzhao can you share your model?

nzmora-nvidia commented 5 months ago

The next trt-engine-explorer contains a fix. It will probably be released some time in Feb. Meanwhile you may try patching the code by replacing the implementation of __fix_output_precision with this (you will have breaks in the graph, but it should not throw an exception):

def __fix_output_precision(df: pd.DataFrame):
    fixed_outputs = []
    for outputs in df['Outputs']:
        try:
            fixed_outputs.append(Activation(outputs[0]).precision)
        except IndexError:
            # Some layers may have empty outputs
            fixed_outputs.append('')
    df['output_precision'] = fixed_outputs
slimwangyue commented 5 months ago

I am having the same error when convert onnx to trt engine. Does anyone know what exactly node [trainStation1] is? I cannot find it in my onnx model visualized by netron.

[01/16/2024-16:04:32] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [01/16/2024-16:04:32] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped [01/16/2024-16:04:33] [I] Finished parsing network model. Parse time: 0.425705 [01/16/2024-16:04:33] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes. [01/16/2024-16:04:33] [I] [TRT] Graph optimization time: 0.628155 seconds. [01/16/2024-16:04:33] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [01/16/2024-16:04:34] [E] Error[10]: Could not find any implementation for node [trainStation1]. [01/16/2024-16:04:34] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node [trainStation1].) [01/16/2024-16:04:34] [E] Engine could not be created from network [01/16/2024-16:04:34] [E] Building engine failed [01/16/2024-16:04:34] [E] Failed to create engine from model or file. [01/16/2024-16:04:34] [E] Engine set up failed

nzmora-nvidia commented 5 months ago

@slimwangyue your error is different and it will be better addressed in a new ticket. A TrainStation layer is not an ONNX operator but an internal TensorRT engine layer that manages data-dependent shapes device memory allocation. TrainStation layers synchronize the stream they are invoked from.