NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.68k stars 2.12k forks source link

Failed to convert onnx to engine using polygraphy's load-tactics #4040

Closed Smarter-version closed 2 months ago

Smarter-version commented 2 months ago

The command I use is: polygraphy convert identity.onnx --fp16 --save-tactics replay.json -o 0.engine Modified the inputs and outputs of foreignnode in replay.json from DataType.HALF to DataType.FLOAT, as they have a significant impact on the precision. Then regenerate the engine model based on the modified replay.json: polygraphy convert identity.onnx --fp16 --load-tactics replay.json -o 1.engine But an error was reported: [ERROR] Exception caught in select_algorithms(): PolygraphyException: Layer: {ForeignNode[Abs_507.... .Cast_3382]} | Tactic in replay was not provided by TensorRT as a choice for this layer. Has the network or builder configuration changed since the replay file was generated?

lix19937 commented 2 months ago

Modified the inputs and outputs of foreignnode in replay.json from DataType.HALF to DataType.FLOAT, as they have a significant impact on the precision.

You modify the inner node in/out type, it maybe destory the fusion tatic.

To make this process easier, Polygraphy provides two built-in algorithm selectors: TacticRecorder and TacticReplayer. The former can be used to record tactics selected during an engine build, and the latter to play them back during a subsequent build. The CLI tools include --save-tactics and --load-tactics options correspnding to these.

Smarter-version commented 2 months ago

Modified the inputs and outputs of foreignnode in replay.json from DataType.HALF to DataType.FLOAT, as they have a significant impact on the precision.

You modify the inner node in/out type, it maybe destory the fusion tatic.

To make this process easier, Polygraphy provides two built-in algorithm selectors: TacticRecorder and TacticReplayer. The former can be used to record tactics selected during an engine build, and the latter to play them back during a subsequent build. The CLI tools include --save-tactics and --load-tactics options correspnding to these.

Thank you for your response. This means that there is no way I can use a customized replay.json file to generate the engine file right? Is there any other way to change the input/output type of the foreignnode node?

lix19937 commented 2 months ago

Is there any other way to change the input/output type of the foreignnode node?

find the node in-out, then use follow trtexec --layerPrecisions=spec --layerOutputTypes=spec --layerDeviceTypes=spec

Smarter-version commented 2 months ago

find the node in-out, then use follow trtexec --layerPrecisions=spec --layerOutputTypes=spec --layerDeviceTypes=spec

I tried that, but the nodes are wrapped inside the foreignnode and it doesn't change the input/output type of one of the nodes

lix19937 commented 2 months ago

You can mark the node as out node by onnx api, then rebuild by trtexec.

Smarter-version commented 2 months ago

I tried that too, but it gives me an error. 2024-07-31 15-41-02 的屏幕截图

lix19937 commented 2 months ago

OOM, insufficient workspace, Can you upload the full log with trtexec --verbose ?

Smarter-version commented 2 months ago

Here's my code.


EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

TRT_LOGGER = trt.Logger()

def get_engine(onnx_file_path, engine_file_path=""):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""

    """Takes an ONNX file and creates a TensorRT engine to run inference with"""
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH
    ) as network, builder.create_builder_config() as config, trt.OnnxParser(
        network, TRT_LOGGER
    ) as parser, trt.Runtime(
        TRT_LOGGER
    ) as runtime:
        config.max_workspace_size = 1 << 32

        config.set_flag(trt.BuilderFlag.FP16)
        config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

        builder.max_batch_size = 1
        # Parse model file
        print("Loading ONNX file from path {}...".format(onnx_file_path))
        with open(onnx_file_path, "rb") as model:
            print("Beginning ONNX file parsing")
            if not parser.parse(model.read()):
                print("ERROR: Failed to parse the ONNX file.")
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                return None
        print("Completed parsing of ONNX file")

        print("Building an engine from file {}; this may take a while...".format(onnx_file_path))
        plan = builder.build_serialized_network(network, config)
        # for layer in network:
        #     print(f"Layer name: {layer.name}, Precision: {layer.precision}")
        # # return 0

        # 设置网络中特定层的精度
        fp32_layer = ['Reshape_3209',  '(Unnamed Layer* 1501) [Shuffle]', '(Unnamed Layer* 2290) [Constant]', '(Unnamed Layer* 2292) [Shuffle]', '(Unnamed Layer* 3701) [Constant]', '(Unnamed Layer* 3703) [Shuffle]', '(Unnamed Layer* 5112) [Constant]', '(Unnamed Layer* 5114) [Shuffle]',
                      '(Unnamed Layer* 6523) [Constant]', '(Unnamed Layer* 6525) [Shuffle]', '(Unnamed Layer* 7934) [Constant]', '(Unnamed Layer* 7936) [Shuffle]']
        for i, layer in enumerate(network):
            if layer.name in fp32_layer:
                layer.precision = trt.float32
                layer.set_output_type(0, trt.float32)
                print(f"fp32 index: {i}; name: {layer.name}")
        assert network[1501].precision == trt.float32
        assert network[2176].precision == trt.float32
        for layer in network:
            print(f"Layer name: {layer.name}, Precision: {layer.precision}")

        config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
        plan = builder.build_serialized_network(network, config)     
        engine = runtime.deserialize_cuda_engine(plan)
        print("Completed creating Engine")
        with open(engine_file_path, "wb") as f:
            f.write(plan)

        return engine
Smarter-version commented 2 months ago

This is the err log:

[08/01/2024-16:20:11] [TRT] [W] TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.1.1 [08/01/2024-16:23:41] [TRT] [W] Skipping tactic 0x0000000000000000 due to exception [verify_outputtype] Mismatched type for tensor (Unnamed Layer 2292) [Shuffle]_output', f32 vs. expected type:f16. [08/01/2024-16:23:41] [TRT] [E] 4: [optimizer.cpp::computeCosts::3726] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[Abs_507...Cast_3382]} due to insufficient workspace. See verbose log for requested sizes.) [08/01/2024-16:23:41] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) Traceback (most recent call last): File "onnx_to_engine_layer.py", line 86, in main() File "onnx_to_engine_layer.py", line 83, in main engine = get_engine(onnx_file_path, engine_file_path) File "onnx_to_engine_layer.py", line 70, in get_engine engine = runtime.deserialize_cuda_engine(plan) TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:

  1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7fbca3784d70>, None

Smarter-version commented 2 months ago
$trt_bin_path/trtexec \
   --onnx=$onnx_path \
   --saveEngine=$engine_path \
   --plugins=$plugins_path \
   --verbose --workspace=2048 \
   --exportProfile=${engine_path}.profile.json \
   --exportLayerInfo=${engine_path}.graph.json \
   --profilingVerbosity=detailed \
   --fp16 \
   --precisionConstraints=obey \
   --layerPrecisions=Add_1546:fp32,Slice_1501:fp32 --layerOutputTypes=Add_1546:fp32,Slice_1501:fp32:

2024-07-24 15-53-39 的屏幕截图

Like this, the internal type cannot be changed.

lix19937 commented 2 months ago

Try to increase or decrease the size config.max_workspace_size = 1 << 32.

BTW, try to change config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS) to config.set_flag(trt.BuilderFlag.PEFER_PRECISION_CONSTRAINTS)

Smarter-version commented 2 months ago

Try to increase or decrease the size config.max_workspace_size = 1 << 32.

I tried config.max_workspace_size = 1 << 28,it doesn't work.

BTW, try to change config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS) to config.set_flag(trt.BuilderFlag.PEFER_PRECISION_CONSTRAINTS)

I tried, it doesn't work.

lix19937 commented 2 months ago

Or increase the size config.max_workspace_size = 1 << 36

Smarter-version commented 2 months ago

Or increase the size config.max_workspace_size = 1 << 36

Doesn't work.I'm now skeptical that there is really a way to change myelin(foreignnode(...)) internal node types?

lix19937 commented 2 months ago

Myelin compiler usually optimizes the sub-graph with specific parttern(e.g. multi-head attention in Transformer based network or some point-wise operations, like slice, gather). Myelin is a TRT component of which the behavior is not documented.

If you want to break the myelin graph-nodes optimize, you can split the model/onnx, or replace some ops with plugin. Also, restruct the forward code, advoid the myelin case.

Smarter-version commented 2 months ago

Thanks for your reply, I will try. I have no more questions, issue will close.