NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.57k stars 2.1k forks source link

Polygraphy: Converting tensorrt models with postprocessing constraints(--trt-npps) is not effective #3679

Closed miraiaroha closed 6 months ago

miraiaroha commented 6 months ago

Description

I wanted to use polygraphy to find out which layers lost precision. I use the following command to convert model and inspect the model layers, but found almost the precisions are still FP32, are there any wrongs with my command? convert model with fp16 and postprocessing polygraphy convert model.onnx --fp16 --precision-constraints obey --trt-npps add_constraints.py -o model.engine --verbose > log.txt

add_constraints.py: import tensorrt as trt def postprocess(network): cnt = 0 for layer in network: if "/model/simplefeature" in layer.name or "/model/encoder" in layer.name or "/model/decoder" in layer.name or "/postprocessor" in layer.name: if layer.precision == trt.float16: layer.precision = trt.float32 for i in range(layer.num_outputs): if layer.get_output_type(i) == trt.float16: layer.set_output_type(i, trt.float32) cnt+=1

some verbose log: [V] Loaded Module: polygraphy | Version: 0.49.0 | Path: ['/usr/local/lib/python3.10/dist-packages/polygraphy'] [V] Loaded Module: tensorrt | Version: 8.6.2 | Path: ['/usr/lib/python3.10/dist-packages/tensorrt'] [V] [MemUsageChange] Init CUDA: CPU +12, GPU +0, now: CPU 41, GPU 9820 (MiB) [V] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1042, now: CPU 1231, GPU 10902 (MiB) [V] ---------------------------------------------------------------- [V] Input filename: /home/root/code/code/dev/detr_tensorrt/dinov2det/dinov2-small-rtdetr-966-546-op16-ep351-sim.onnx [V] ONNX IR version: 0.0.8 [V] Opset version: 16 [V] Producer name: pytorch [V] Producer version: 2.0.0 [V] Domain: [V] Model version: 0 [V] Doc string: [V] Setting TensorRT Optimization Profiles [V] Input tensor: images (dtype=DataType.FLOAT, shape=(1, 3, 546, 966)) | Setting input tensor shapes to: (min=[1, 3, 546, 966], opt=[1, 3, 546, 966], max=[1, 3, 546, 966]) [V] Input tensor: orig_target_sizes (dtype=DataType.INT32, shape=(1, 2)) | Setting input tensor shapes to: (min=[1, 2], opt=[1, 2], max=[1, 2]) [I] Configuring with profiles:[ Profile 0: {images [min=[1, 3, 546, 966], opt=[1, 3, 546, 966], max=[1, 3, 546, 966]], orig_target_sizes [min=[1, 2], opt=[1, 2], max=[1, 2]]} ] [I] Building engine with configuration: Flags | [FP16, OBEY_PRECISION_CONSTRAINTS] Engine Capability | EngineCapability.DEFAULT Memory Pools | [WORKSPACE: 15388.48 MiB, TACTIC_DRAM: 13765.00 MiB] Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS] Profiling Verbosity | ProfilingVerbosity.DETAILED Preview Features | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805] [V] Graph optimization time: 0.308624 seconds. [V] Global timing cache in use. Profiling results in this builder pass will be stored. [V] Detected 2 inputs and 3 output network tensors. [V] Total Host Persistent Memory: 242640 [V] Total Device Persistent Memory: 61440 [V] Total Scratch Memory: 38880768 [V] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 62 MiB, GPU 452 MiB [V] [BlockAssignment] Started assigning block shifts. This will take 132 steps to complete. [V] [BlockAssignment] Algorithm ShiftNTopDown took 16.181ms to assign 9 blocks to 132 nodes requiring 69562880 bytes. [V] Total Activation Memory: 69562880 [W] TensorRT encountered issues when converting weights between types and that could affect accuracy. [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. [W] Check verbose logs for the list of affected weights. [W] - 1 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity. [W] - 218 weights are affected by this issue: Detected subnormal FP16 values. [W] - 69 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. [W] - 6 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value. [V] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +50, GPU +128, now: CPU 50, GPU 128 (MiB) [I] Finished engine building in 743.236 seconds

show layers polygraphy inspect model.engine --model-type engine --show layers > log.txt [I] ==== TensorRT Engine ==== Name: Unnamed Network 0 | Explicit Batch Engine

---- 2 Engine Input(s) ----
{images [dtype=float32, shape=(1, 3, 546, 966)],
 orig_target_sizes [dtype=int32, shape=(1, 2)]}

---- 3 Engine Output(s) ----
{scores [dtype=float32, shape=(1, 300)],
 labels [dtype=int32, shape=(1, 300)],
 boxes [dtype=float32, shape=(1, 300, 4)]}

---- Memory ----
Device Memory: 69562880 bytes

---- 1 Profile(s) (5 Tensor(s) Each) ----
- Profile: 0
    Tensor: images                     (Input), Index: 0 | Shapes: min=(1, 3, 546, 966), opt=(1, 3, 546, 966), max=(1, 3, 546, 966)
    Tensor: orig_target_sizes          (Input), Index: 1 | Shapes: min=(1, 2), opt=(1, 2), max=(1, 2)
    Tensor: scores                    (Output), Index: 2 | Shape: (1, 300)
    Tensor: labels                    (Output), Index: 3 | Shape: (1, 300)
    Tensor: boxes                     (Output), Index: 4 | Shape: (1, 300, 4)

---- 130 Layer(s) ----
- Profile: 0
    Layer 0    | /model/encoder/encoder.0/layers.0/Constant_output_0 [Op: Constant]
        {} -> {(Unnamed Layer* 1055) [Constant]_output [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 1    | model.encoder.encoder.0.layers.0.norm1.weight + (Unnamed Layer* 1111) [Shuffle] [Op: Constant]
        {} -> {(Unnamed Layer* 1111) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format]}

    Layer 2    | model.encoder.encoder.0.layers.0.norm1.bias + (Unnamed Layer* 1114) [Shuffle] [Op: Constant]
        {} -> {(Unnamed Layer* 1114) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format]}

    Layer 3    | model.encoder.encoder.0.layers.0.norm2.weight + (Unnamed Layer* 1147) [Shuffle] [Op: Constant]
        {} -> {(Unnamed Layer* 1147) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format]}

    Layer 4    | model.encoder.encoder.0.layers.0.norm2.bias + (Unnamed Layer* 1150) [Shuffle] [Op: Constant]
        {} -> {(Unnamed Layer* 1150) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format]}

    Layer 5    | Reformatting CopyNode for Input Tensor 0 to /model/backbone/patch_embed/proj/Conv [Op: Reformat]
        {images [dtype=float32, shape=(1, 3, 546, 966), Format: Row major linear FP32]}
         -> {Reformatted Input Tensor 0 to /model/backbone/patch_embed/proj/Conv [dtype=float16, shape=(1, 3, 546, 966), Format: Channel major FP16 format where channel % 4 == 0]}

    Layer 6    | /model/backbone/patch_embed/proj/Conv [Op: CaskConvolution]
        {Reformatted Input Tensor 0 to /model/backbone/patch_embed/proj/Conv [dtype=float16, shape=(1, 3, 546, 966), Format: Channel major FP16 format where channel % 4 == 0]}
         -> {/model/backbone/patch_embed/proj/Conv_output_0 [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 7    | Reformatting CopyNode for Input Tensor 0 to {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [Op: Reformat]
        {/model/backbone/patch_embed/proj/Conv_output_0 [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Input Tensor 0 to {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [dtype=float16, shape=(1, 384, 39, 69), Format: Row major linear FP16 format]}

    Layer 8    | {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [Op: Myelin]
        {Reformatted Input Tensor 0 to {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [dtype=float16, shape=(1, 384, 39, 69), Format: Row major linear FP16 format]}
         -> {Reformatted Output Tensor 0 to {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [dtype=float16, shape=(1, 384, 39, 69), Format: Row major linear FP16 format]}

    Layer 9    | Reformatting CopyNode for Output Tensor 0 to {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [Op: Reformat]
        {Reformatted Output Tensor 0 to {ForeignNode[/model/backbone/patch_embed/Constant_output_0.../model/Transpose + /model/Reshape]} [dtype=float16, shape=(1, 384, 39, 69), Format: Row major linear FP16 format]}
         -> {/model/Reshape_output_0 [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 16 == 0]}

    Layer 10   | Reformatting CopyNode for Input Tensor 0 to /model/simplefeature/fpn1/ConvTranspose [Op: NoOp]
        {/model/Reshape_output_0 [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 16 == 0]}
         -> {Reformatted Input Tensor 0 to /model/simplefeature/fpn1/ConvTranspose [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 11   | /model/simplefeature/fpn1/ConvTranspose [Op: CaskDeconvolutionV2]
        {Reformatted Input Tensor 0 to /model/simplefeature/fpn1/ConvTranspose [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/simplefeature/fpn1/ConvTranspose_output_0 [dtype=float16, shape=(1, 512, 78, 138), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 12   | /model/encoder/Resize [Op: Resize]
        {/model/simplefeature/fpn1/ConvTranspose_output_0 [dtype=float16, shape=(1, 512, 78, 138), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/Resize_output_0 [dtype=float16, shape=(1, 512, 76, 136), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 13   | /model/encoder/input_proj.0/input_proj.0.0/Conv [Op: CaskGemmConvolution]
        {/model/encoder/Resize_output_0 [dtype=float16, shape=(1, 512, 76, 136), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/Concat_4_output_0 [dtype=float16, shape=(1, 384, 76, 136), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 14   | Reformatting CopyNode for Input Tensor 0 to /model/simplefeature/fpn2/Conv [Op: NoOp]
        {/model/Reshape_output_0 [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 16 == 0]}
         -> {Reformatted Input Tensor 0 to /model/simplefeature/fpn2/Conv [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 15   | /model/simplefeature/fpn2/Conv [Op: CaskGemmConvolution]
        {Reformatted Input Tensor 0 to /model/simplefeature/fpn2/Conv [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/simplefeature/fpn2/Conv_output_0 [dtype=float16, shape=(1, 1024, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 16   | /model/encoder/Resize_1 [Op: Resize]
        {/model/simplefeature/fpn2/Conv_output_0 [dtype=float16, shape=(1, 1024, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/Resize_1_output_0 [dtype=float16, shape=(1, 1024, 38, 68), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 17   | /model/encoder/input_proj.1/input_proj.1.0/Conv [Op: CaskGemmConvolution]
        {/model/encoder/Resize_1_output_0 [dtype=float16, shape=(1, 1024, 38, 68), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/Concat_3_output_0 [dtype=float16, shape=(1, 384, 38, 68), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 18   | Reformatting CopyNode for Input Tensor 0 to /model/simplefeature/fpn3_1/MaxPool [Op: NoOp]
        {/model/Reshape_output_0 [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 16 == 0]}
         -> {Reformatted Input Tensor 0 to /model/simplefeature/fpn3_1/MaxPool [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 19   | /model/simplefeature/fpn3_1/MaxPool [Op: CaskPooling]
        {Reformatted Input Tensor 0 to /model/simplefeature/fpn3_1/MaxPool [dtype=float16, shape=(1, 384, 39, 69), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/simplefeature/fpn3_1/MaxPool_output_0 [dtype=float16, shape=(1, 384, 19, 34), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 20   | /model/simplefeature/fpn3_2/Conv [Op: CaskGemmConvolution]
        {/model/simplefeature/fpn3_1/MaxPool_output_0 [dtype=float16, shape=(1, 384, 19, 34), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/simplefeature/fpn3_2/Conv_output_0 [dtype=float16, shape=(1, 2048, 19, 34), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 21   | /model/encoder/input_proj.2/input_proj.2.0/Conv [Op: CaskGemmConvolution]
        {/model/simplefeature/fpn3_2/Conv_output_0 [dtype=float16, shape=(1, 2048, 19, 34), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/input_proj.2/input_proj.2.0/Conv_output_0 [dtype=float16, shape=(1, 384, 19, 34), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 22   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/Reshape [Op: Reformat]
        {/model/encoder/input_proj.2/input_proj.2.0/Conv_output_0 [dtype=float16, shape=(1, 384, 19, 34), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Input Tensor 0 to /model/encoder/Reshape [dtype=float16, shape=(1, 384, 19, 34), Format: Row major linear FP16 format]}

    Layer 23   | /model/encoder/Reshape [Op: NoOp]
        {Reformatted Input Tensor 0 to /model/encoder/Reshape [dtype=float16, shape=(1, 384, 19, 34), Format: Row major linear FP16 format]}
         -> {/model/encoder/Reshape_output_0 [dtype=float16, shape=(1, 384, 646), Format: Row major linear FP16 format]}

    Layer 24   | /model/encoder/Transpose [Op: Shuffle]
        {/model/encoder/Reshape_output_0 [dtype=float16, shape=(1, 384, 646), Format: Row major linear FP16 format]}
         -> {/model/encoder/Transpose_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 25   | /model/encoder/encoder.0/layers.0/Add [Op: ElementWise]
        {/model/encoder/Transpose_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         (Unnamed Layer* 1055) [Constant]_output [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/Add_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 26   | /model/encoder/encoder.0/layers.0/self_attn/Transpose + reshape_before_/model/encoder/encoder.0/layers.0/self_attn/MatMul_1 [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/Add_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {reshape_before_/model/encoder/encoder.0/layers.0/self_attn/MatMul_1_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 27   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [Op: NoOp]
        {reshape_before_/model/encoder/encoder.0/layers.0/self_attn/MatMul_1_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 28   | /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [Op: CaskGemmConvolution]
        {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Output Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 768, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 29   | Reformatting CopyNode for Output Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [Op: NoOp]
        {Reformatted Output Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 768, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 768, 1, 1), Format: Row major linear FP16 format]}

    Layer 30   | reshape_after_/model/encoder/encoder.0/layers.0/self_attn/MatMul_1 + /model/encoder/encoder.0/layers.0/self_attn/Reshape_1 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_4 [Op: Shuffle]
        {/model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/Transpose_4_output_0 [dtype=float16, shape=(8, 48, 646), Format: Row major linear FP16 format]}

    Layer 31   | reshape_after_/model/encoder/encoder.0/layers.0/self_attn/MatMul + /model/encoder/encoder.0/layers.0/self_attn/Reshape + /model/encoder/encoder.0/layers.0/self_attn/Transpose_2 [Op: Shuffle]
        {/model/encoder/encoder.0/layers.0/self_attn/MatMul_1 || /model/encoder/encoder.0/layers.0/self_attn/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/Transpose_2_output_0 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}

    Layer 32   | Reformatting CopyNode for Input Tensor 0 to PWN(/model/encoder/encoder.0/layers.0/self_attn/Constant_3_output_0 + (Unnamed Layer* 1083) [Shuffle], /model/encoder/encoder.0/layers.0/self_attn/Div) [Op: Reformat]
        {/model/encoder/encoder.0/layers.0/self_attn/Transpose_2_output_0 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}
         -> {Reformatted Input Tensor 0 to PWN(/model/encoder/encoder.0/layers.0/self_attn/Constant_3_output_0 + (Unnamed Layer* 1083) [Shuffle], /model/encoder/encoder.0/layers.0/self_attn/Div) [dtype=float32, shape=(8, 646, 48), Format: Row major linear FP32]}

    Layer 33   | PWN(/model/encoder/encoder.0/layers.0/self_attn/Constant_3_output_0 + (Unnamed Layer* 1083) [Shuffle], /model/encoder/encoder.0/layers.0/self_attn/Div) [Op: PointWiseV2]
        {Reformatted Input Tensor 0 to PWN(/model/encoder/encoder.0/layers.0/self_attn/Constant_3_output_0 + (Unnamed Layer* 1083) [Shuffle], /model/encoder/encoder.0/layers.0/self_attn/Div) [dtype=float32, shape=(8, 646, 48), Format: Row major linear FP32]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/Div_output_0 [dtype=float32, shape=(8, 646, 48), Format: Row major linear FP32]}

    Layer 34   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_3 [Op: Reformat]
        {/model/encoder/encoder.0/layers.0/self_attn/Div_output_0 [dtype=float32, shape=(8, 646, 48), Format: Row major linear FP32]}
         -> {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_3 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}

    Layer 35   | /model/encoder/encoder.0/layers.0/self_attn/MatMul_3 [Op: CaskGemmMatrixMultiply]
        {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_3 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/self_attn/Transpose_4_output_0 [dtype=float16, shape=(8, 48, 646), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/MatMul_3_output_0 [dtype=float16, shape=(8, 646, 646), Format: Row major linear FP16 format]}

    Layer 36   | /model/encoder/encoder.0/layers.0/self_attn/Softmax [Op: CaskSoftMaxV2]
        {/model/encoder/encoder.0/layers.0/self_attn/MatMul_3_output_0 [dtype=float16, shape=(8, 646, 646), Format: Row major linear FP16 format]}
         -> {(Unnamed Layer* 1087) [Softmax]_output [dtype=float16, shape=(8, 646, 646), Format: Row major linear FP16 format]}

    Layer 37   | /model/encoder/encoder.0/layers.0/self_attn/Transpose_1 + reshape_before_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2 [Op: Shuffle]
        {/model/encoder/Reshape_output_0 [dtype=float16, shape=(1, 384, 646), Format: Row major linear FP16 format]}
         -> {reshape_before_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 38   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_2 [Op: NoOp]
        {reshape_before_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_2 [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 39   | /model/encoder/encoder.0/layers.0/self_attn/MatMul_2 [Op: CaskGemmConvolution]
        {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/MatMul_2 [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/MatMul_2_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 40   | Reformatting CopyNode for Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2 + /model/encoder/encoder.0/layers.0/self_attn/Reshape_2 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_3 [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/self_attn/MatMul_2_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2 + /model/encoder/encoder.0/layers.0/self_attn/Reshape_2 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_3 [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 41   | reshape_after_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2 + /model/encoder/encoder.0/layers.0/self_attn/Reshape_2 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_3 [Op: Shuffle]
        {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/self_attn/MatMul_2 + /model/encoder/encoder.0/layers.0/self_attn/Reshape_2 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_3 [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/Transpose_3_output_0 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}

    Layer 42   | /model/encoder/encoder.0/layers.0/self_attn/MatMul_4 [Op: CaskGemmMatrixMultiply]
        {(Unnamed Layer* 1087) [Softmax]_output [dtype=float16, shape=(8, 646, 646), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/self_attn/Transpose_3_output_0 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/MatMul_4_output_0 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}

    Layer 43   | /model/encoder/encoder.0/layers.0/self_attn/Transpose_5 + /model/encoder/encoder.0/layers.0/self_attn/Reshape_3 + reshape_before_/model/encoder/encoder.0/layers.0/self_attn/Gemm [Op: Shuffle]
        {/model/encoder/encoder.0/layers.0/self_attn/MatMul_4_output_0 [dtype=float16, shape=(8, 646, 48), Format: Row major linear FP16 format]}
         -> {reshape_before_/model/encoder/encoder.0/layers.0/self_attn/Gemm_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 44   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/Gemm [Op: NoOp]
        {reshape_before_/model/encoder/encoder.0/layers.0/self_attn/Gemm_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/Gemm [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 45   | /model/encoder/encoder.0/layers.0/self_attn/Gemm [Op: CaskGemmConvolution]
        {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/self_attn/Gemm [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/Gemm_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 46   | Reformatting CopyNode for Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/self_attn/Gemm + /model/encoder/encoder.0/layers.0/self_attn/Reshape_4 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_6 [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/self_attn/Gemm_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/self_attn/Gemm + /model/encoder/encoder.0/layers.0/self_attn/Reshape_4 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_6 [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 47   | reshape_after_/model/encoder/encoder.0/layers.0/self_attn/Gemm + /model/encoder/encoder.0/layers.0/self_attn/Reshape_4 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_6 [Op: NoOp]
        {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/self_attn/Gemm + /model/encoder/encoder.0/layers.0/self_attn/Reshape_4 + /model/encoder/encoder.0/layers.0/self_attn/Transpose_6 [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/self_attn/Transpose_6_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 48   | /model/encoder/encoder.0/layers.0/Add_1 [Op: ElementWise]
        {/model/encoder/Transpose_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/self_attn/Transpose_6_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 49   | /model/encoder/encoder.0/layers.0/norm1/ReduceMean [Op: Reduce]
        {/model/encoder/encoder.0/layers.0/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm1/ReduceMean_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format]}

    Layer 50   | /model/encoder/encoder.0/layers.0/norm1/Sub [Op: ElementWise]
        {/model/encoder/encoder.0/layers.0/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/norm1/ReduceMean_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm1/Sub_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 51   | PWN(/model/backbone/blocks.0/norm1/Constant_output_0 + (Unnamed Layer* 1102) [Shuffle], /model/encoder/encoder.0/layers.0/norm1/Pow) [Op: PointWiseV2]
        {/model/encoder/encoder.0/layers.0/norm1/Sub_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm1/Pow_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 52   | /model/encoder/encoder.0/layers.0/norm1/ReduceMean_1 [Op: Reduce]
        {/model/encoder/encoder.0/layers.0/norm1/Pow_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm1/ReduceMean_1_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format]}

    Layer 53   | PWN(PWN(PWN(PWN(PWN(/model/encoder/encoder.0/layers.0/norm1/Constant_1_output_0 + (Unnamed Layer* 1106) [Shuffle], /model/encoder/encoder.0/layers.0/norm1/Add), PWN(/model/encoder/encoder.0/layers.0/norm1/Sqrt)), /model/encoder/encoder.0/layers.0/norm1/Div), /model/encoder/encoder.0/layers.0/norm1/Mul), /model/encoder/encoder.0/layers.0/norm1/Add_1) [Op: PointWiseV2]
        {/model/encoder/encoder.0/layers.0/norm1/ReduceMean_1_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/norm1/Sub_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         (Unnamed Layer* 1111) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format],
         (Unnamed Layer* 1114) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm1/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 54   | reshape_before_/model/encoder/encoder.0/layers.0/linear1/MatMul [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/norm1/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {reshape_before_/model/encoder/encoder.0/layers.0/linear1/MatMul_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 55   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/encoder.0/layers.0/linear1/MatMul [Op: NoOp]
        {reshape_before_/model/encoder/encoder.0/layers.0/linear1/MatMul_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/linear1/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 56   | /model/encoder/encoder.0/layers.0/linear1/MatMul [Op: CaskGemmConvolution]
        {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/linear1/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/encoder.0/layers.0/linear1/MatMul_out_tensor [dtype=float16, shape=(646, 2048, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 57   | Reformatting CopyNode for Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/linear1/MatMul [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/linear1/MatMul_out_tensor [dtype=float16, shape=(646, 2048, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/linear1/MatMul [dtype=float16, shape=(646, 2048, 1, 1), Format: Row major linear FP16 format]}

    Layer 58   | reshape_after_/model/encoder/encoder.0/layers.0/linear1/MatMul [Op: NoOp]
        {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/linear1/MatMul [dtype=float16, shape=(646, 2048, 1, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/linear1/Add_output_0 [dtype=float16, shape=(1, 646, 2048), Format: Row major linear FP16 format]}

    Layer 59   | PWN(PWN(PWN(PWN(PWN(/model/backbone/blocks.0/mlp/act/Constant_output_0 + (Unnamed Layer* 1122) [Shuffle], /model/encoder/encoder.0/layers.0/activation/Div), PWN(/model/encoder/encoder.0/layers.0/activation/Erf)), PWN(/model/backbone/blocks.0/mlp/act/Constant_1_output_0 + (Unnamed Layer* 1125) [Shuffle], /model/encoder/encoder.0/layers.0/activation/Add)), /model/encoder/encoder.0/layers.0/activation/Mul), PWN(/model/backbone/blocks.0/mlp/act/Constant_2_output_0 + (Unnamed Layer* 1128) [Shuffle], /model/encoder/encoder.0/layers.0/activation/Mul_1)) [Op: PointWiseV2]
        {/model/encoder/encoder.0/layers.0/linear1/Add_output_0 [dtype=float16, shape=(1, 646, 2048), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/activation/Mul_1_output_0 [dtype=float16, shape=(1, 646, 2048), Format: Row major linear FP16 format]}

    Layer 60   | reshape_before_/model/encoder/encoder.0/layers.0/linear2/MatMul [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/activation/Mul_1_output_0 [dtype=float16, shape=(1, 646, 2048), Format: Row major linear FP16 format]}
         -> {reshape_before_/model/encoder/encoder.0/layers.0/linear2/MatMul_out_tensor [dtype=float16, shape=(646, 2048, 1, 1), Format: Row major linear FP16 format]}

    Layer 61   | Reformatting CopyNode for Input Tensor 0 to /model/encoder/encoder.0/layers.0/linear2/MatMul [Op: NoOp]
        {reshape_before_/model/encoder/encoder.0/layers.0/linear2/MatMul_out_tensor [dtype=float16, shape=(646, 2048, 1, 1), Format: Row major linear FP16 format]}
         -> {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/linear2/MatMul [dtype=float16, shape=(646, 2048, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 62   | /model/encoder/encoder.0/layers.0/linear2/MatMul [Op: CaskGemmConvolution]
        {Reformatted Input Tensor 0 to /model/encoder/encoder.0/layers.0/linear2/MatMul [dtype=float16, shape=(646, 2048, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {/model/encoder/encoder.0/layers.0/linear2/MatMul_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}

    Layer 63   | Reformatting CopyNode for Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/linear2/MatMul [Op: NoOp]
        {/model/encoder/encoder.0/layers.0/linear2/MatMul_out_tensor [dtype=float16, shape=(646, 384, 1, 1), Format: Channel major FP16 format where channel % 8 == 0]}
         -> {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/linear2/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}

    Layer 64   | reshape_after_/model/encoder/encoder.0/layers.0/linear2/MatMul [Op: NoOp]
        {Reformatted Input Tensor 0 to reshape_after_/model/encoder/encoder.0/layers.0/linear2/MatMul [dtype=float16, shape=(646, 384, 1, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/linear2/Add_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 65   | /model/encoder/encoder.0/layers.0/Add_2 [Op: ElementWise]
        {/model/encoder/encoder.0/layers.0/norm1/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/linear2/Add_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/Add_2_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 66   | /model/encoder/encoder.0/layers.0/norm2/ReduceMean [Op: Reduce]
        {/model/encoder/encoder.0/layers.0/Add_2_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm2/ReduceMean_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format]}

    Layer 67   | /model/encoder/encoder.0/layers.0/norm2/Sub [Op: ElementWise]
        {/model/encoder/encoder.0/layers.0/Add_2_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/norm2/ReduceMean_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm2/Sub_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 68   | PWN(/model/backbone/blocks.0/norm1/Constant_output_0_clone_1 + (Unnamed Layer* 1139) [Shuffle], /model/encoder/encoder.0/layers.0/norm2/Pow) [Op: PointWiseV2]
        {/model/encoder/encoder.0/layers.0/norm2/Sub_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm2/Pow_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer 69   | /model/encoder/encoder.0/layers.0/norm2/ReduceMean_1 [Op: Reduce]
        {/model/encoder/encoder.0/layers.0/norm2/Pow_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm2/ReduceMean_1_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format]}

    Layer 70   | PWN(PWN(PWN(PWN(PWN(/model/encoder/encoder.0/layers.0/norm1/Constant_1_output_0_clone_1 + (Unnamed Layer* 1142) [Shuffle], /model/encoder/encoder.0/layers.0/norm2/Add), PWN(/model/encoder/encoder.0/layers.0/norm2/Sqrt)), /model/encoder/encoder.0/layers.0/norm2/Div), /model/encoder/encoder.0/layers.0/norm2/Mul), /model/encoder/encoder.0/layers.0/norm2/Add_1) [Op: PointWiseV2]
        {/model/encoder/encoder.0/layers.0/norm2/ReduceMean_1_output_0 [dtype=float16, shape=(1, 646, 1), Format: Row major linear FP16 format],
         /model/encoder/encoder.0/layers.0/norm2/Sub_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format],
         (Unnamed Layer* 1147) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format],
         (Unnamed Layer* 1150) [Shuffle]_output [dtype=float16, shape=(1, 1, 384), Format: Row major linear FP16 format]}
         -> {/model/encoder/encoder.0/layers.0/norm2/Add_1_output_0 [dtype=float16, shape=(1, 646, 384), Format: Row major linear FP16 format]}

    Layer  ....
    Layer 128  | Reformatting CopyNode for Output Tensor 2 to {ForeignNode[/postprocessor/Constant_14_output_0.../postprocessor/GatherElements]} [Op: Reformat]
        {Reformatted Output Tensor 2 to {ForeignNode[/postprocessor/Constant_14_output_0.../postprocessor/GatherElements]} [dtype=float16, shape=(1, 300, 4), Format: Row major linear FP16 format]}
         -> {boxes [dtype=float32, shape=(1, 300, 4), Format: Row major linear FP32]}

    Layer 129  | Reformatting CopyNode for Output Tensor 0 to {ForeignNode[/postprocessor/Constant_14_output_0.../postprocessor/GatherElements]} [Op: Reformat]
        {Reformatted Output Tensor 0 to {ForeignNode[/postprocessor/Constant_14_output_0.../postprocessor/GatherElements]} [dtype=float16, shape=(1, 300), Format: Row major linear FP16 format]}
         -> {scores [dtype=float32, shape=(1, 300), Format: Row major linear FP32]}`

Environment

Hardware: Orin NX 16G

TensorRT Version: 8.6

Docker: dustynv/l4t-pytorch:r36.2.0

miraiaroha commented 6 months ago

@zerollzeng

miraiaroha commented 6 months ago

I find the reason why is not effective, the init layer precision is always DataType.FLOAT.