NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.54k stars 2.1k forks source link

DataType Failed failure of TensorRT 10.0.1.6 when running RT-DETRv2 on GPU 1650Ti #4048

Open 5p6 opened 1 month ago

5p6 commented 1 month ago

Description

I used C++'s Tensor and CUDA APIs to infer the RT-DETRv2 model, and during the runtime, I discovered the following issues,The output information of Tensor's API does not match. When I execute nvinfer1:: ICudaEngine:: getTensorFormatDesc and 'nvinfer1:: ICudaEngine:: getTensorDataType',the code like this

   void allocator() {
        TensorNum = engine->getNbIOTensors();
        // 无论是输入输出,都分配内存
        for (int i = 0; i < TensorNum; i++) {
            // 获取张量信息
            const char* name = engine->getIOTensorName(i);
            nvinfer1::Dims dims = engine->getTensorShape(name);
            nvinfer1::DataType type = engine->getTensorDataType(name);
            // 张量类型
            const char* mode_name = engine->getTensorIOMode(name) == nvinfer1::TensorIOMode::kINPUT ? "input" : "output";
            // 张量名称
            tensor_name.emplace_back(name);
            // 图像尺寸
            tensor_size.emplace_back(std::make_pair(dims.d[2], dims.d[3]));
            // 求输入张量的字节数
            int nbytes = perbytes[int(type)];
            for (int i = 0; i < dims.nbDims; i++)
                nbytes = nbytes * dims.d[i];
            tensor_bytes.emplace_back(nbytes);
            // cuda分配内存 ,并且将其放入到映射中, 名字 : 内存地址头
            name_ptr.insert(std::make_pair(name, safeCudaMalloc(nbytes)));
            std::cout 
                << " tensor mode : "<< mode_name
                << " , tensor name : " << name
                << " , tensor dim : " << dims.d[0] << " X " << dims.d[1] << " X " << dims.d[2] << " X " << dims.d[3]
                << " , tensor btypes :  " << nbytes
                << " , getTensorDataType output type : "<< type_name[int(type)]
                << " , getTensorFormatDesc output description : " << engine->getTensorFormatDesc(name)
                << std::endl;
        }
    }

I find that the information reported by the two APIs is different, as follows

 tensor mode : input , tensor name : images , tensor dim : 1 X 3 X 640 X 640 , tensor btypes :  4915200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
 tensor mode : input , tensor name : orig_target_sizes , tensor dim : 1 X 2 X 0 X 0 , tensor btypes :  16 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
 tensor mode : output , tensor name : labels , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  2400 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
 tensor mode : output , tensor name : boxes , tensor dim : 1 X 300 X 4 X 0 , tensor btypes :  4800 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
 tensor mode : output , tensor name : scores , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  1200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)

look at the tensor orig_target_sizes ,the getTensorDataType report that data type is kINT64,but the getTensorFormatDesc report that data type is int8,the getTensorFormatDesc is correct, whenI allocate CUDA memory as the getTensorFormatDesc report, the model will run directly without any problem,which troubles me a lot because when I use CUDA to allocate memory, I use getTensorDataType to automatically allocate memory, but now I need to manually allocate myself for different models. Could you please fix it?

Environment

TensorRT Version: 10.0.1.6

NVIDIA GPU: 1650Ti 4G

NVIDIA Driver Version:12.2

CUDA Version:12.2

CUDNN Version:8.8

Operating System: Windows 11

Python Version (if applicable):None

Tensorflow Version (if applicable): None

PyTorch Version (if applicable):None

Baremetal or Container (if so, version):None

Relevant Files

Model link:None

Steps To Reproduce

Commands or scripts: No Have you tried the latest release?: yet not Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): yes

lix19937 commented 1 month ago

How was your plan generated ? Can you provide the cmd ?

5p6 commented 1 month ago

How was your plan generated ? Can you provide the cmd ?

generate the engine file by onnx weight,the commad like

trtexec.exe --onnx=model.onnx --saveEngine=rtdetr_f16_i8.engine  --explicitBatch --fp16 --int8
lix19937 commented 1 month ago

So, can you run follow cmd, then upload li.json ?

   trtexec.exe --onnx=model.onnx --saveEngine=rtdetr_f16_i8.engine  --explicitBatch --fp16 --int8 \
     --dumpProfile \
     --noDataTransfers --useCudaGraph --useSpinWait  --separateProfileRun \
    --dumpLayerInfo  --exportLayerInfo=li.json
5p6 commented 1 month ago

So, can you run follow cmd, then upload li.json ?

   trtexec.exe --onnx=model.onnx --saveEngine=rtdetr_f16_i8.engine  --explicitBatch --fp16 --int8 \
     --dumpProfile \
     --noDataTransfers --useCudaGraph --useSpinWait  --separateProfileRun \
    --dumpLayerInfo  --exportLayerInfo=li.json

I run similar but not identical commands

   trtexec.exe --onnx=model.onnx --saveEngine=rtdetr_f16_i8.engine  --fp16 --int8 \
     --dumpProfile \
     --noDataTransfers --useCudaGraph --useSpinWait  --separateProfileRun \
    --dumpLayerInfo  --exportLayerInfo=li.json

as you could see,i removed the option --explicitBatch because of the lack of this option in TensorRT 10.0.1.6,and the result json file of this command.

{"Layers": ["Reformatting CopyNode for Input Tensor 0 to /model/backbone/conv1/conv1_1/conv/Conv + /model/backbone/conv1/conv1_1/act/Relu"
,"/model/backbone/conv1/conv1_1/conv/Conv + /model/backbone/conv1/conv1_1/act/Relu"
,"/model/backbone/conv1/conv1_2/conv/Conv + /model/backbone/conv1/conv1_2/act/Relu"
,"/model/backbone/conv1/conv1_3/conv/Conv + /model/backbone/conv1/conv1_3/act/Relu"
,"/model/backbone/MaxPool"
,"/model/backbone/res_layers.0/blocks.0/branch2a/conv/Conv + /model/backbone/res_layers.0/blocks.0/branch2a/act/Relu"
,"/model/backbone/res_layers.0/blocks.0/branch2b/conv/Conv"
,"/model/backbone/res_layers.0/blocks.0/short/conv/Conv + /model/backbone/res_layers.0/blocks.0/Add + /model/backbone/res_layers.0/blocks.0/act/Relu"
,"/model/backbone/res_layers.0/blocks.1/branch2a/conv/Conv + /model/backbone/res_layers.0/blocks.1/branch2a/act/Relu"
,"/model/backbone/res_layers.0/blocks.1/branch2b/conv/Conv + /model/backbone/res_layers.0/blocks.1/Add + /model/backbone/res_layers.0/blocks.1/act/Relu"
,"/model/backbone/res_layers.1/blocks.0/branch2a/conv/Conv + /model/backbone/res_layers.1/blocks.0/branch2a/act/Relu"
,"/model/backbone/res_layers.1/blocks.0/branch2b/conv/Conv"
,"/model/backbone/res_layers.1/blocks.0/short/pool/AveragePool"
,"/model/backbone/res_layers.1/blocks.0/short/conv/conv/Conv + /model/backbone/res_layers.1/blocks.0/Add + /model/backbone/res_layers.1/blocks.0/act/Relu"
,"/model/backbone/res_layers.1/blocks.1/branch2a/conv/Conv + /model/backbone/res_layers.1/blocks.1/branch2a/act/Relu"
,"/model/backbone/res_layers.1/blocks.1/branch2b/conv/Conv + /model/backbone/res_layers.1/blocks.1/Add + /model/backbone/res_layers.1/blocks.1/act/Relu"
,"/model/backbone/res_layers.2/blocks.0/branch2a/conv/Conv + /model/backbone/res_layers.2/blocks.0/branch2a/act/Relu"
,"/model/backbone/res_layers.2/blocks.0/branch2b/conv/Conv"
,"/model/backbone/res_layers.2/blocks.0/short/pool/AveragePool"
,"/model/backbone/res_layers.2/blocks.0/short/conv/conv/Conv + /model/backbone/res_layers.2/blocks.0/Add + /model/backbone/res_layers.2/blocks.0/act/Relu"
,"/model/backbone/res_layers.2/blocks.1/branch2a/conv/Conv + /model/backbone/res_layers.2/blocks.1/branch2a/act/Relu"
,"/model/backbone/res_layers.2/blocks.1/branch2b/conv/Conv + /model/backbone/res_layers.2/blocks.1/Add + /model/backbone/res_layers.2/blocks.1/act/Relu"
,"/model/backbone/res_layers.3/blocks.0/branch2a/conv/Conv + /model/backbone/res_layers.3/blocks.0/branch2a/act/Relu"
,"/model/backbone/res_layers.3/blocks.0/branch2b/conv/Conv"
,"/model/backbone/res_layers.3/blocks.0/short/pool/AveragePool"
,"/model/backbone/res_layers.3/blocks.0/short/conv/conv/Conv + /model/backbone/res_layers.3/blocks.0/Add + /model/backbone/res_layers.3/blocks.0/act/Relu"
,"/model/backbone/res_layers.3/blocks.1/branch2a/conv/Conv + /model/backbone/res_layers.3/blocks.1/branch2a/act/Relu"
,"/model/backbone/res_layers.3/blocks.1/branch2b/conv/Conv + /model/backbone/res_layers.3/blocks.1/Add + /model/backbone/res_layers.3/blocks.1/act/Relu"
,"/model/encoder/input_proj.2/conv/Conv"
,"Reformatting CopyNode for Input Tensor 0 to {ForeignNode[/model/encoder/Reshape.../model/encoder/Transpose_1 + /model/encoder/Reshape_1]}"
,"__myl_ResTraTraResMovResAdd_myl30_0"
,"__mye5401_myl30_1"
,"__mye5403_myl30_2"
,"fc_/model/encoder/encoder_0/layers_0/self_attn/MatMul_2_myl30_3"
,"__mye5405_myl30_4"
,"fc_/model/encoder/encoder_0/layers_0/self_attn/MatMul_1+fc_/model/encoder/encoder_0/layers_0/self_attn/MatMul_myl30_5"
,"__myl_TraMul_myl30_6"
,"__mye5407_myl30_7"
,"_gemm_mha_v2_myl30_8"
,"__myl_TraRes_myl30_9"
,"fc_/model/encoder/encoder_0/layers_0/self_attn/Gemm_myl30_10"
,"__myl_MovResResAddResMeaSubMulMeaAddSqrDivMulMulAdd_myl30_11"
,"__myl_Fc_myl30_12"
,"fc_/model/encoder/encoder_0/layers_0/linear2/MatMul_myl30_13"
,"__myl_ResAddResMeaSubMulMea_myl30_14"
,"__myl_AddSqrDivMulMulAddResTraRes_myl30_15"
,"Reformatting CopyNode for Input Tensor 0 to /model/encoder/lateral_convs.0/conv/Conv + PWN(PWN(/model/encoder/lateral_convs.0/act/Sigmoid), PWN(/model/encoder/lateral_convs.0/act/Mul))"
,"/model/encoder/lateral_convs.0/conv/Conv + PWN(PWN(/model/encoder/lateral_convs.0/act/Sigmoid), PWN(/model/encoder/lateral_convs.0/act/Mul))"
,"/model/encoder/Resize"
,"/model/encoder/input_proj.1/conv/Conv"
,"/model/encoder/Resize_output_0 copy"
,"/model/encoder/fpn_blocks.0/conv1/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.0/conv1/act/Sigmoid), PWN(/model/encoder/fpn_blocks.0/conv1/act/Mul))"
,"/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.0/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.0/act/Sigmoid), PWN(/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.0/act/Mul))"
,"/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.1/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.1/act/Sigmoid), PWN(/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.1/act/Mul))"
,"/model/encoder/fpn_blocks.0/conv2/conv/Conv"
,"/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.2/conv/Conv + PWN(PWN(PWN(/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.2/act/Sigmoid), PWN(/model/encoder/fpn_blocks.0/bottlenecks/bottlenecks.2/act/Mul)), PWN(PWN(PWN(/model/encoder/fpn_blocks.0/conv2/act/Sigmoid), PWN(/model/encoder/fpn_blocks.0/conv2/act/Mul)), PWN(/model/encoder/fpn_blocks.0/Add)))"
,"/model/encoder/fpn_blocks.0/conv3/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.0/conv3/act/Sigmoid), PWN(/model/encoder/fpn_blocks.0/conv3/act/Mul))"
,"/model/encoder/lateral_convs.1/conv/Conv + PWN(PWN(/model/encoder/lateral_convs.1/act/Sigmoid), PWN(/model/encoder/lateral_convs.1/act/Mul))"
,"/model/encoder/Resize_1"
,"/model/encoder/input_proj.0/conv/Conv"
,"/model/encoder/Resize_1_output_0 copy"
,"/model/encoder/fpn_blocks.1/conv1/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.1/conv1/act/Sigmoid), PWN(/model/encoder/fpn_blocks.1/conv1/act/Mul))"
,"/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.0/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.0/act/Sigmoid), PWN(/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.0/act/Mul))"
,"/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.1/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.1/act/Sigmoid), PWN(/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.1/act/Mul))"
,"/model/encoder/fpn_blocks.1/conv2/conv/Conv"
,"/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.2/conv/Conv + PWN(PWN(PWN(/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.2/act/Sigmoid), PWN(/model/encoder/fpn_blocks.1/bottlenecks/bottlenecks.2/act/Mul)), PWN(PWN(PWN(/model/encoder/fpn_blocks.1/conv2/act/Sigmoid), PWN(/model/encoder/fpn_blocks.1/conv2/act/Mul)), PWN(/model/encoder/fpn_blocks.1/Add)))"
,"/model/encoder/fpn_blocks.1/conv3/conv/Conv + PWN(PWN(/model/encoder/fpn_blocks.1/conv3/act/Sigmoid), PWN(/model/encoder/fpn_blocks.1/conv3/act/Mul))"
,"/model/encoder/downsample_convs.0/conv/Conv + PWN(PWN(/model/encoder/downsample_convs.0/act/Sigmoid), PWN(/model/encoder/downsample_convs.0/act/Mul))"
,"/model/encoder/lateral_convs.1/act/Mul_output_0 copy"
,"/model/encoder/pan_blocks.0/conv1/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.0/conv1/act/Sigmoid), PWN(/model/encoder/pan_blocks.0/conv1/act/Mul))"
,"/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.0/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.0/act/Sigmoid), PWN(/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.0/act/Mul))"
,"/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.1/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.1/act/Sigmoid), PWN(/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.1/act/Mul))"
,"/model/encoder/pan_blocks.0/conv2/conv/Conv"
,"/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.2/conv/Conv + PWN(PWN(PWN(/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.2/act/Sigmoid), PWN(/model/encoder/pan_blocks.0/bottlenecks/bottlenecks.2/act/Mul)), PWN(PWN(PWN(/model/encoder/pan_blocks.0/conv2/act/Sigmoid), PWN(/model/encoder/pan_blocks.0/conv2/act/Mul)), PWN(/model/encoder/pan_blocks.0/Add)))"
,"/model/encoder/pan_blocks.0/conv3/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.0/conv3/act/Sigmoid), PWN(/model/encoder/pan_blocks.0/conv3/act/Mul))"
,"/model/encoder/downsample_convs.1/conv/Conv + PWN(PWN(/model/encoder/downsample_convs.1/act/Sigmoid), PWN(/model/encoder/downsample_convs.1/act/Mul))"
,"/model/encoder/lateral_convs.0/act/Mul_output_0 copy"
,"/model/encoder/pan_blocks.1/conv1/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.1/conv1/act/Sigmoid), PWN(/model/encoder/pan_blocks.1/conv1/act/Mul))"
,"/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.0/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.0/act/Sigmoid), PWN(/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.0/act/Mul))"
,"/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.1/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.1/act/Sigmoid), PWN(/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.1/act/Mul))"
,"/model/encoder/pan_blocks.1/conv2/conv/Conv"
,"/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.2/conv/Conv + PWN(PWN(PWN(/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.2/act/Sigmoid), PWN(/model/encoder/pan_blocks.1/bottlenecks/bottlenecks.2/act/Mul)), PWN(PWN(PWN(/model/encoder/pan_blocks.1/conv2/act/Sigmoid), PWN(/model/encoder/pan_blocks.1/conv2/act/Mul)), PWN(/model/encoder/pan_blocks.1/Add)))"
,"/model/encoder/pan_blocks.1/conv3/conv/Conv + PWN(PWN(/model/encoder/pan_blocks.1/conv3/act/Sigmoid), PWN(/model/encoder/pan_blocks.1/conv3/act/Mul))"
,"/model/decoder/input_proj.2/conv/Conv"
,"/model/decoder/input_proj.1/conv/Conv"
,"/model/decoder/input_proj.0/conv/Conv"
,"entry^bb^signal^2_myl71_0"
,"entry^bb^wait^4_myl71_1"
,"entry^bb^wait^2_myl71_2"
,"__myl_Tra_myl71_3"
,"__myl_Tra_myl71_4"
,"__myl_Tra_myl71_5"
,"__myl_Mul_myl71_6"
,"fc_/model/decoder/enc_output/proj/MatMul_myl71_7"
,"__myl_AddMeaSubMulMeaAddSqrDivMulMulAdd_myl71_8"
,"fc_/model/decoder/enc_bbox_head/layers_0/MatMul+fc_/model/decoder/enc_score_head/MatMul_myl71_9"
,"__mye39289_myl71_10"
,"__mye39291_myl71_11"
,"__myl_MovAddMax_myl71_12"
,"__myl_Top_myl71_13"
,"__mye39293_myl71_14"
,"__myl_MovAddRel_myl71_15"
,"fc_/model/decoder/enc_bbox_head/layers_1/MatMul_myl71_16"
,"__myl_AddRel_myl71_17"
,"fc_/model/decoder/enc_bbox_head/layers_2/MatMul_myl71_18"
,"__mye39295_myl71_19"
,"__myl_CasResCasRepAddAddGatResNegExpAddDivRes_myl71_20"
,"fc_/model/decoder/decoder/query_pos_head/layers_0/MatMul_myl71_21"
,"__myl_AddRel_myl71_22"
,"fc_/model/decoder/decoder/query_pos_head/layers_1/MatMul_myl71_23"
,"__myl_RepGatAddResResAdd_myl71_24"
,"__mye39297_myl71_25"
,"__mye39299_myl71_26"
,"fc_/model/decoder/decoder/layers_0/self_attn/MatMul_2_myl71_27"
,"__mye39301_myl71_28"
,"__mye39303_myl71_29"
,"__myl_Add_myl71_30"
,"fc_/model/decoder/decoder/layers_0/self_attn/MatMul_1+fc_/model/decoder/decoder/layers_0/self_attn/MatMul_myl71_31"
,"__mye39305_myl71_32"
,"__myl_MovAdd_myl71_33"
,"__mye39307_myl71_34"
,"__myl_MovAddResTraMul_myl71_35"
,"__mye39309_myl71_36"
,"__mye39311_myl71_37"
,"fc_/model/decoder/decoder/layers_0/self_attn/MatMul_3_myl71_38"
,"__myl_MaxSubExpSumDivMul_myl71_39"
,"fc_/model/decoder/decoder/layers_0/self_attn/MatMul_4_myl71_40"
,"__myl_Tra_myl71_41"
,"fc_/model/decoder/decoder/layers_0/self_attn/Gemm_myl71_42"
,"__myl_AddResAddResMeaSubMulMeaAddSqrDivMulMulAddAdd_myl71_43"
,"fc_/model/decoder/decoder/layers_0/cross_attn/attention_weights/MatMul+fc_/model/decoder/decoder/layers_0/cross_attn/sampling_offsets/MatMul_myl71_44"
,"__mye39313_myl71_45"
,"__mye39315_myl71_46"
,"__myl_MovAddResMaxSubExpSum_myl71_47"
,"fc_/model/decoder/decoder/layers_0/cross_attn/value_proj/MatMul_myl71_48"
,"__mye39317_myl71_49"
,"__mye39319_myl71_50"
,"__myl_AddRes_myl71_51"
,"__myl_TraRes_myl71_52"
,"__mye39321_myl71_53"
,"__mye39323_myl71_54"
,"__myl_SliRes_myl71_55"
,"__myl_Tra_myl71_56"
,"__mye39327_myl71_57"
,"__mye39325_myl71_58"
,"__myl_SliRes_myl71_59"
,"__myl_Tra_myl71_60"
,"__mye39331_myl71_61"
,"__myl_SliRes_myl71_62"
,"__myl_Tra_myl71_63"
,"__mye39335_myl71_64"
,"__mye39329_myl71_65"
,"__mye39333_myl71_66"
,"__mye39337_myl71_67"
,"__myl_ResSliSliMovAddResMulMulMulAddMulAddTraResSliSliSliRevAddMulAddMulFloCasSubSubAddMaxMinGatEtc_myl71_68"
,"__mye39339_myl71_69"
,"__myl_Tra_myl71_70"
,"__mye39341_myl71_71"
,"__myl_Tra_myl71_72"
,"__mye39349_myl71_73"
,"__mye39343_myl71_74"
,"__myl_Tra_myl71_75"
,"__mye39345_myl71_76"
,"__mye39347_myl71_77"
,"__mye39351_myl71_78"
,"__myl_DivMulTraResConMulSum_myl71_79"
,"fc_/model/decoder/decoder/layers_0/cross_attn/output_proj/MatMul_myl71_80"
,"__myl_ResAddAddResMeaSubMulMeaAddSqrDivMulMulAdd_myl71_81"
,"fc_/model/decoder/decoder/layers_0/linear1/MatMul_myl71_82"
,"__myl_AddRel_myl71_83"
,"fc_/model/decoder/decoder/layers_0/linear2/MatMul_myl71_84"
,"__myl_ResAddAddResMeaSubMulMeaAddSqrDivMulMulAddRes_myl71_85"
,"fc_/model/decoder/decoder/dec_bbox_head_0/layers_0/MatMul_myl71_86"
,"__myl_AddRel_myl71_87"
,"fc_/model/decoder/decoder/dec_bbox_head_0/layers_1/MatMul_myl71_88"
,"__myl_AddRel_myl71_89"
,"fc_/model/decoder/decoder/dec_bbox_head_0/layers_2/MatMul_myl71_90"
,"PWN(PWN(/model/decoder/decoder/Clip), PWN(PWN(PWN(PWN(PWN(PWN(PWN(/model/decoder/decoder/Constant_4_output_0 + ONNXTRT_Broadcast_819, PWN(/model/decoder/decoder/Sub)), PWN(/model/decoder/decoder/Constant_5_output_0 + ONNXTRT_Broadcast_822, PWN((Unnamed Layer* 1287) [ElementWise]))), PWN((Unnamed Layer* 1284) [Constant] + ONNXTRT_Broadcast_824, PWN((Unnamed Layer* 1288) [ElementWise]))), PWN(PWN(PWN(/model/decoder/decoder/Constant_3_output_0 + ONNXTRT_Broadcast_815, PWN((Unnamed Layer* 1278) [ElementWise])), PWN((Unnamed Layer* 1275) [Constant] + ONNXTRT_Broadcast_817, PWN((Unnamed Layer* 1279) [ElementWise]))), PWN(/model/decoder/decoder/Div))), PWN(/model/decoder/decoder/Log)), PWN(/model/decoder/decoder/Add)), PWN(/model/decoder/decoder/Sigmoid_1)))"
,"Reformatting CopyNode for Output Tensor 0 to PWN(PWN(/model/decoder/decoder/Clip), PWN(PWN(PWN(PWN(PWN(PWN(PWN(/model/decoder/decoder/Constant_4_output_0 + ONNXTRT_Broadcast_819, PWN(/model/decoder/decoder/Sub)), PWN(/model/decoder/decoder/Constant_5_output_0 + ONNXTRT_Broadcast_822, PWN((Unnamed Layer* 1287) [ElementWise]))), PWN((Unnamed Layer* 1284) [Constant] + ONNXTRT_Broadcast_824, PWN((Unnamed Layer* 1288) [ElementWise]))), PWN(PWN(PWN(/model/decoder/decoder/Constant_3_output_0 + ONNXTRT_Broadcast_815, PWN((Unnamed Layer* 1278) [ElementWise])), PWN((Unnamed Layer* 1275) [Constant] + ONNXTRT_Broadcast_817, PWN((Unnamed Layer* 1279) [ElementWise]))), PWN(/model/decoder/decoder/Div))), PWN(/model/decoder/decoder/Log)), PWN(/model/decoder/decoder/Add)), PWN(/model/decoder/decoder/Sigmoid_1)))"
,"PWN(/model/decoder/decoder/Clip_3)"
,"Reformatting CopyNode for Input Tensor 0 to {ForeignNode[/model/decoder/decoder/layers.1/self_attn/Transpose_1.../model/decoder/decoder/Sigmoid_2]}"
,"Reformatting CopyNode for Input Tensor 2 to {ForeignNode[/model/decoder/decoder/layers.1/self_attn/Transpose_1.../model/decoder/decoder/Sigmoid_2]}"
,"entry^bb^signal^1_myl77_0"
,"entry^bb^wait^2_myl77_1"
,"entry^bb^wait^1_myl77_2"
,"fc_/model/decoder/decoder/layers_1/self_attn/MatMul_2_myl77_3"
,"__mye26173_myl77_4"
,"fc_/model/decoder/decoder/query_pos_head/layers_0_1/MatMul_myl77_5"
,"__myl_Rel_myl77_6"
,"fc_/model/decoder/decoder/query_pos_head/layers_1_1/MatMul_myl77_7"
,"__myl_Add_myl77_8"
,"fc_/model/decoder/decoder/layers_1/self_attn/MatMul_1+fc_/model/decoder/decoder/layers_1/self_attn/MatMul_myl77_9"
,"__myl_TraMul_myl77_10"
,"__mye26175_myl77_11"
,"_gemm_mha_v2_myl77_12"
,"__myl_TraRes_myl77_13"
,"fc_/model/decoder/decoder/layers_1/self_attn/Gemm_myl77_14"
,"__myl_AddResMeaSubMulMeaAddSqrDivMulMulAdd_myl77_15"
,"__myl_Add_myl77_16"
,"fc_/model/decoder/decoder/layers_1/cross_attn/attention_weights/MatMul+fc_/model/decoder/decoder/layers_1/cross_attn/sampling_offsets/MatMul_myl77_17"
,"__mye26177_myl77_18"
,"fc_/model/decoder/decoder/layers_1/cross_attn/value_proj/MatMul_myl77_19"
,"__mye26179_myl77_20"
,"__myl_MovResMaxSubExpSum_myl77_21"
,"__myl_TraRes_myl77_22"
,"__mye26181_myl77_23"
,"__mye26183_myl77_24"
,"__myl_SliRes_myl77_25"
,"__myl_TraCas_myl77_26"
,"__mye26185_myl77_27"
,"__myl_SliRes_myl77_28"
,"__myl_TraCas_myl77_29"
,"__mye26187_myl77_30"
,"__myl_MovResMulSliMulMulSliAddMulAddTraResSliSliSliSliResCasRevTraCasAddMulAddMulFloCasSubSubAddEtc_myl77_31"
,"__mye26189_myl77_32"
,"__myl_CasTra_myl77_33"
,"__mye26191_myl77_34"
,"__myl_CasTra_myl77_35"
,"__mye26199_myl77_36"
,"__mye26193_myl77_37"
,"__myl_CasTra_myl77_38"
,"__mye26195_myl77_39"
,"__mye26197_myl77_40"
,"__mye26201_myl77_41"
,"__myl_DivMulTraResConMulSum_myl77_42"
,"fc_/model/decoder/decoder/layers_1/cross_attn/output_proj/MatMul_myl77_43"
,"__myl_ResMeaSubMulMeaAddSqrDivMulMulAdd_myl77_44"
,"fc_/model/decoder/decoder/layers_1/linear1/MatMul_myl77_45"
,"fc_/model/decoder/decoder/layers_1/linear2/MatMul_myl77_46"
,"__myl_ResAddResMeaSubMulMeaAddSqrDivMulMulAddRes_myl77_47"
,"fc_/model/decoder/decoder/dec_bbox_head_1/layers_0/MatMul_myl77_48"
,"fc_/model/decoder/decoder/dec_bbox_head_1/layers_1/MatMul_myl77_49"
,"fc_/model/decoder/decoder/dec_bbox_head_1/layers_2/MatMul_myl77_50"
,"__myl_MaxMinSubMaxMinDivLogResAddNegExpAddDiv_myl77_51"
,"Reformatting CopyNode for Output Tensor 1 to {ForeignNode[/model/decoder/decoder/layers.1/self_attn/Transpose_1.../model/decoder/decoder/Sigmoid_2]}"
,"Reformatting CopyNode for Input Tensor 0 to PWN(/model/decoder/decoder/Clip_6)"
,"PWN(/model/decoder/decoder/Clip_6)"
,"Reformatting CopyNode for Input Tensor 1 to {ForeignNode[/postprocessor/Expand.../postprocessor/GatherElements]}"
,"Reformatting CopyNode for Input Tensor 2 to {ForeignNode[/postprocessor/Expand.../postprocessor/GatherElements]}"
,"entry^bb^signal^1_myl83_0"
,"entry^bb^wait^2_myl83_1"
,"entry^bb^wait^1_myl83_2"
,"fc_/model/decoder/decoder/layers_2/self_attn/MatMul_2_myl83_3"
,"__mye44692_myl83_4"
,"__mye44694_myl83_5"
,"__myl_Add_myl83_6"
,"fc_/model/decoder/decoder/query_pos_head/layers_0_2/MatMul_myl83_7"
,"__myl_AddRel_myl83_8"
,"fc_/model/decoder/decoder/query_pos_head/layers_1_2/MatMul_myl83_9"
,"__myl_AddResAdd_myl83_10"
,"fc_/model/decoder/decoder/layers_2/self_attn/MatMul_1+fc_/model/decoder/decoder/layers_2/self_attn/MatMul_myl83_11"
,"__mye44696_myl83_12"
,"__myl_MovAdd_myl83_13"
,"__mye44698_myl83_14"
,"__myl_MovAddResTraMul_myl83_15"
,"__mye44700_myl83_16"
,"__mye44702_myl83_17"
,"fc_/model/decoder/decoder/layers_2/self_attn/MatMul_3_myl83_18"
,"__myl_MaxSubExpSumDivMul_myl83_19"
,"__mye44704_myl83_20"
,"__mye44706_myl83_21"
,"fc_/model/decoder/decoder/layers_2/self_attn/MatMul_4_myl83_22"
,"__myl_Tra_myl83_23"
,"fc_/model/decoder/decoder/layers_2/self_attn/Gemm_myl83_24"
,"__myl_AddResAddResMeaSubMulMeaAddSqrDivMulMulAddAdd_myl83_25"
,"fc_/model/decoder/decoder/layers_2/cross_attn/attention_weights/MatMul+fc_/model/decoder/decoder/layers_2/cross_attn/sampling_offsets/MatMul_myl83_26"
,"__mye44708_myl83_27"
,"__mye44710_myl83_28"
,"__myl_MovAddResMaxSubExpSum_myl83_29"
,"fc_/model/decoder/decoder/layers_2/cross_attn/value_proj/MatMul_myl83_30"
,"__mye44712_myl83_31"
,"__mye44714_myl83_32"
,"__myl_AddRes_myl83_33"
,"__myl_TraRes_myl83_34"
,"__mye44716_myl83_35"
,"__mye44718_myl83_36"
,"__myl_SliRes_myl83_37"
,"__myl_Tra_myl83_38"
,"__mye44730_myl83_39"
,"__mye44720_myl83_40"
,"__myl_SliRes_myl83_41"
,"__myl_Tra_myl83_42"
,"__mye44726_myl83_43"
,"__myl_SliRes_myl83_44"
,"__myl_Tra_myl83_45"
,"__mye44722_myl83_46"
,"__mye44724_myl83_47"
,"__mye44728_myl83_48"
,"__mye44732_myl83_49"
,"__myl_SliSliMovAddResMulMulMulAddMulAddTraResSliSliSliRevAddMulAddMulFloCasSubSubAddMaxMinGatLteEtc_myl83_50"
,"__mye44734_myl83_51"
,"__myl_Tra_myl83_52"
,"__mye44736_myl83_53"
,"__myl_Tra_myl83_54"
,"__mye44744_myl83_55"
,"__mye44738_myl83_56"
,"__myl_Tra_myl83_57"
,"__mye44740_myl83_58"
,"__mye44742_myl83_59"
,"__mye44746_myl83_60"
,"__myl_DivMulTraResConMulSum_myl83_61"
,"fc_/model/decoder/decoder/layers_2/cross_attn/output_proj/MatMul_myl83_62"
,"__myl_ResAddAddResMeaSubMulMeaAddSqrDivMulMulAdd_myl83_63"
,"fc_/model/decoder/decoder/layers_2/linear1/MatMul_myl83_64"
,"__myl_AddRel_myl83_65"
,"fc_/model/decoder/decoder/layers_2/linear2/MatMul_myl83_66"
,"__myl_ResAddAddResMeaSubMulMeaAddSqrDivMulMulAdd_myl83_67"
,"fc_/model/decoder/decoder/dec_score_head_2/MatMul+fc_/model/decoder/decoder/dec_bbox_head_2/layers_0/MatMul_myl83_68"
,"__mye44748_myl83_69"
,"__mye44750_myl83_70"
,"__myl_MovAddResGatResNegExpAdd_myl83_71"
,"__myl_DivResTop_myl83_72"
,"__mye44752_myl83_73"
,"__myl_MovAddRel_myl83_74"
,"fc_/model/decoder/decoder/dec_bbox_head_2/layers_1/MatMul_myl83_75"
,"__myl_AddRel_myl83_76"
,"fc_/model/decoder/decoder/dec_bbox_head_2/layers_2/MatMul_myl83_77"
,"__mye44754_myl83_78"
,"__myl_RepResCasCasDivResCasRepMulSubMaxMinSubMaxMinDivLogResAddNegExpAddDivResGatSliResSliResSliEtc_myl83_79"
],
"Bindings": ["images"
,"orig_target_sizes"
,"labels"
,"boxes"
,"scores"
]}

When I run C++code with weight rtdetr_f16_i8.engine, the result is as follows

 tensor mode : input , tensor name : images , tensor dim : 1 X 3 X 640 X 640 , tensor btypes :  4915200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
 tensor mode : input , tensor name : orig_target_sizes , tensor dim : 1 X 2 X 0 X 0 , tensor btypes :  16 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
 tensor mode : output , tensor name : labels , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  2400 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
 tensor mode : output , tensor name : boxes , tensor dim : 1 X 300 X 4 X 0 , tensor btypes :  4800 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
 tensor mode : output , tensor name : scores , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  1200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)

look at the tensor orig_target_sizes, the output of getTensorFormatDesc is INT8 , and the output of getTensorDataType is kINT64.later ,i will show my originall code,the inference class code TRTInference.hpp:

#ifndef RTDETRinferENCE_HPP
#define RTDETRinferENCE_HPP
#include <NvInfer.h>
#include <cuda_runtime_api.h>
#include <iostream>
#include <fstream>
#include <vector>
#include <unordered_map>
#include <NvInferPlugin.h>
class Logger : public nvinfer1::ILogger {
    void log(Severity severity, const char* msg) noexcept override {
        if (severity != Severity::kINFO) {
            std::cout << msg << std::endl;
        }
    }
};
void* safeCudaMalloc(size_t memSize) {
    void* deviceMem;
    cudaError_t status = cudaMalloc(&deviceMem, memSize);
    if (status != cudaSuccess) {
        std::cerr << "cudaMalloc failed: " << cudaGetErrorString(status) << std::endl;
        exit(1);
    }
    return deviceMem;
}
void checkCudaError(cudaError_t status) {
    if (status != cudaSuccess) {
        std::cerr << "CUDA error: " << cudaGetErrorString(status) << std::endl;
        exit(1);
    }
}
class RTDETRinfer {
    // init -> load engine -> allocate cuda memory  -> inference
public:
    RTDETRinfer(const std::string& engine_path) {
        load_engine(engine_path);
        allocator();
    }
    bool infer(void* const* binds,std::vector<std::pair<int,int>> input_size) {
        // h X w 检查尺寸
        for(int i =0;i< input_size.size();i++)
            if (input_size[i] != tensor_size[i])
            {
                std::cout << "size error , please check the input size" << std::endl;
                return false;
            }
        // 输入内存转换,PC内存复制到cuda内存
        for (int i = 0; i < input_size.size();i++) {
            cudaMemcpy(name_ptr[tensor_name[i]], binds[i], tensor_bytes[i], cudaMemcpyHostToDevice);
        }
        // 输入输出捆绑
        std::vector<void*> bingdings;
        for (auto& name : tensor_name) {
            bingdings.emplace_back(
                name_ptr[name]
            );
        }
        // 执行
        this->context->executeV2(bingdings.data());

        // 输出返回,cuda输出复制到PC输出,to do
        // cudaMemcpy()
    }
    ~RTDETRinfer() {
        // 释放cuda内存
        for (auto& map : name_ptr) {
            cudaFree(map.second);
        }
    }
private:
    // @brief 分配输入输出的cuda内存,分配好了就只等将 PC输入复制到cuda输入 ,cuda输出复制到PC输出
    void allocator() {
        TensorNum = engine->getNbIOTensors();
        // 无论是输入输出,都分配内存
        for (int i = 0; i < TensorNum; i++) {
            // 获取张量信息
            const char* name = engine->getIOTensorName(i);
            nvinfer1::Dims dims = engine->getTensorShape(name);
            nvinfer1::DataType type = engine->getTensorDataType(name);
            // 张量类型
            const char* mode_name = engine->getTensorIOMode(name) == nvinfer1::TensorIOMode::kINPUT ? "input" : "output";
            // 张量名称
            tensor_name.emplace_back(name);
            // 图像尺寸
            tensor_size.emplace_back(std::make_pair(dims.d[2], dims.d[3]));
            // 求输入张量的字节数
            int nbytes = perbytes[int(type)];
            for (int i = 0; i < dims.nbDims; i++)
                nbytes = nbytes * dims.d[i];
            tensor_bytes.emplace_back(nbytes);
            // cuda分配内存 ,并且将其放入到映射中, 名字 : 内存地址头
            name_ptr.insert(std::make_pair(name, safeCudaMalloc(nbytes)));
            std::cout 
                << " tensor mode : "<< mode_name
                << " , tensor name : " << name
                << " , tensor dim : " << dims.d[0] << " X " << dims.d[1] << " X " << dims.d[2] << " X " << dims.d[3]
                << " , tensor btypes :  " << nbytes
                << " , getTensorDataType output type : "<< type_name[int(type)]
                << " , getTensorFormatDesc output description : " << engine->getTensorFormatDesc(name)
                << std::endl;
        }
    }

    // 下载引擎,并且初始化runtime、cudaengine、context
    void load_engine(const std::string& engine_path) {
        std::ifstream file(engine_path, std::ios::binary);
        if (!file.good()) {
            std::cerr << "Error reading engine file" << std::endl;
            exit(1);
        }
        file.seekg(0, file.end);
        const size_t fsize = file.tellg();
        file.seekg(0, file.beg);
        std::vector<char> engineData(fsize);
        file.read(engineData.data(), fsize);
        file.close();
        // runtime
        runtime.reset(nvinfer1::createInferRuntime(logger));
        if (!runtime) {
            std::cerr << "Failed to create runtime" << std::endl;
            exit(1);
        }
        // 注册所有已知插件
        initLibNvInferPlugins(&logger, "");
        // engine
        engine.reset(runtime->deserializeCudaEngine(engineData.data(), fsize));
        if (!engine) {
            std::cerr << "Failed to create engine" << std::endl;
            exit(1);
        } 
        // context 上下文
        context.reset(engine->createExecutionContext());
    }
private:
    // plugin
    std::unique_ptr< nvinfer1::IRuntime >runtime;
    std::unique_ptr < nvinfer1::ICudaEngine> engine;
    std::unique_ptr < nvinfer1::IExecutionContext> context;
    Logger logger;
    // datatype的bytes
    double perbytes[10] = {4,2,1,4,1,1,1,2,8,0.5};
    const char* type_name[10] = {
        "kFLOAT","kHALF","kINT8","kINT32","kBOOL",
        "kUINT8","kFP8","kBF16","kINT64","kINT4"
    };
    // 张量的总数量
    int TensorNum = 0;
    // 输入输出的尺寸
    std::vector<const char*> tensor_name;
    std::vector<std::pair<int, int>> tensor_size;
    std::vector<int64_t> tensor_bytes;
    std::unordered_map<const char*, void*> name_ptr;
};

#endif // !RTDETRinferENCE_HPP

the main code main.cpp:

#include "TRTInference.hpp"
#include<iostream>
int main() {
    RTDETRinfer("./rtdetr_f16_i8.engine");
    return 0;
}
lix19937 commented 1 month ago

Run follow cmd, upload json file.

   trtexec.exe --onnx=model.onnx --saveEngine=rtdetr_f16_i8.engine  --fp16 --int8 \
     --dumpProfile  --profilingVerbosity=detailed \
     --noDataTransfers --useCudaGraph --useSpinWait  --separateProfileRun \
    --dumpLayerInfo  --exportLayerInfo=li.json
5p6 commented 1 month ago

Run follow cmd, upload json file.

   trtexec.exe --onnx=model.onnx --saveEngine=rtdetr_f16_i8.engine  --fp16 --int8 \
     --dumpProfile  --profilingVerbosity=detailed \
     --noDataTransfers --useCudaGraph --useSpinWait  --separateProfileRun \
    --dumpLayerInfo  --exportLayerInfo=li.json

I have rerun the above command and obtained the log and model information, with the log and JSON files attached in this answer. layerinfo.json result.log But I used the generated engine model file and ran the previous C++code again, but the output still had problems. The terminal output is as follows

tensor mode : input , tensor name : images , tensor dim : 1 X 3 X 640 X 640 , tensor btypes :  4915200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
tensor mode : input , tensor name : orig_target_sizes , tensor dim : 1 X 2 X 0 X 0 , tensor btypes :  16 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
tensor mode : output , tensor name : labels , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  2400 , getTensorDataType output type : kINT64 , getTensorFormatDesc output description : Row major linear INT8 format (kLINEAR)
tensor mode : output , tensor name : boxes , tensor dim : 1 X 300 X 4 X 0 , tensor btypes :  4800 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
tensor mode : output , tensor name : scores , tensor dim : 1 X 300 X 0 X 0 , tensor btypes :  1200 , getTensorDataType output type : kFLOAT , getTensorFormatDesc output description : Row major linear FP32 format (kLINEAR)
lix19937 commented 1 month ago

getTensorFormatDesc: Return the human readable description of the tensor format, or empty string if the provided name does not map to an input or output tensor.

If you want alloc mem auto, use getTensorDataType to get datatype.

ttyio commented 1 month ago

created internal issue to track this, the getTensorFormatDesc is not up to date with all the supported datatypes. And @lix19937 is right, let's use the getTensorDataType when we alloc the memory.

5p6 commented 1 month ago

created internal issue to track this, the getTensorFormatDesc is not up to date with all the supported datatypes. And @lix19937 is right, let's use the getTensorDataType when we alloc the memory.

ok,thanks,i will use the getTensorDataType when alloc the memory.

5p6 commented 1 month ago

getTensorFormatDesc: Return the human readable description of the tensor format, or empty string if the provided name does not map to an input or output tensor.

If you want alloc mem auto, use getTensorDataType to get datatype.

ok,thanks,i will use the getTensorDataType when i want alloc the memory auto.