NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.82k stars 2.13k forks source link

ITopKLayer doesn't work like expected with tensorrt 8 #2680

Closed perseusdg closed 1 year ago

perseusdg commented 1 year ago

Description

When I use the topk layer with a semantic segmentation network converted from pytorch to tensorrt (pidnet) using tkdnn , it works properly with tensorrt 7 , but the same code when used with tensorrt 8 produces a constant 0 output, is this a bug with tensorrt or was the topk layer updated significantly from tensorrt 7 to 8 ,requiring me to change the code for tensorrt8 ?

Environment

TensorRT Version: 7.2.3/8.5 NVIDIA GPU: RTX 3060 Laptop NVIDIA Driver Version: 525.78.01 CUDA Version: 11.1/11.4 CUDNN Version: 8.1/8.6 Operating System: Ubuntu 20.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

To reproduce the results clone my fork of the tkdnn repo at this commit 91a4a3feb756c77480a0a473745e42216c8ec576 and build test_pidnet_s. The weights for building the network can be found here

zerollzeng commented 1 year ago

Does the final engine is built from onnx? If yes can you share the onnx here?

perseusdg commented 1 year ago

No the engine isnt built using onnx, its built using a custom parser(tkDNN), i linked the repo in the issue and also mentioned the commit sha , to build the engine download the weights through the link provided and rename the folder as pidnet_s and place it in the build directory , and build test_pidnet_s using cmake and run it , to visualize it you should build seg_demo and run it using this command ./seg_demo pid_640_fp32.rt

zerollzeng commented 1 year ago

I would highly suspect that this is due to some usage issue, perhaps you can take a look at our onnx parser and compare with your parser. see https://github.com/onnx/onnx-tensorrt/blob/fd119fec8565264add819f8edc801066116a32dd/builtin_op_importers.cpp#L4892

perseusdg commented 1 year ago

Okay , I will take a look at it and check it against tkdnn's parser

perseusdg commented 1 year ago

I compared both the parser and based on my understanding there isn't a significant difference between them. tkDNN supports topk across only axis 1 and for k=1. I have attached the parser for your reference .The attached parser works with well with trt7 but not with trt8.

ILayer* NetworkRT::convert_layer(ITensor *input, TopK *l){
    ILayer *lRT = networkRT->addTopK(*input,TopKOperation::kMAX,1,1);
    checkNULL(lRT);
    ITensor *indices = lRT->getOutput(1);
    indices->setType(nvinfer1::DataType::kFLOAT);
    ILayer *iLRT = networkRT->addIdentity(*indices);
    // iLRT->setOutputType(0,nvinfer1::DataType::kFLOAT);
    checkNULL(iLRT);
    return iLRT;

}
zerollzeng commented 1 year ago

Could you provide a reproduce sample for this? Maybe just extract the topk convert from tkDNN and add standalone test for it.

perseusdg commented 1 year ago

Yeah sure, i will add the topk_test and link the commit here in a couple of days

perseusdg commented 1 year ago

The commit is available here https://github.com/perseusdg/tkDNN/tree/dev_bug and the executable is called test_topk ,the weights for the topk test are here.

We found this same issue occuring in the IElementwiseLayer especially when we use ops like equal,less,greater,or,and or ISelectLayer where there is some form of conversion from float to bool/int32or vice-versa , it gives wrong results on TensorRT8 but with TensorRT7 the results are correct. I also have attached an example of how we convert it using the IIdentityLayer from tensorrt :

if(l->op_mode == tk::dnn::OP_AND || l->op_mode == tk::dnn::OP_OR){
        IIdentityLayer *l1RT = networkRT->addIdentity(*l1);
        l1RT->setOutputType(0,DataType::kBOOL);
        IIdentityLayer *l2RT = networkRT->addIdentity(*l2);
        l2RT->setOutputType(0,DataType::kBOOL);
        IElementWiseLayer *lRT = networkRT->addElementWise(*l1RT->getOutput(0),*l2RT->getOutput(0),static_cast<ElementWiseOperation>(l->op_mode));
        IIdentityLayer* olRT = networkRT->addIdentity(*lRT->getOutput(0));
        olRT->setOutputType(0,DataType::kFLOAT);
        checkNULL(olRT);
        return olRT;
    }
perseusdg commented 1 year ago

On going through the build logs, despite setting the indices type for the topk layer as fp32 i did notice that output was still int32 and the identity layer did nothing , if i try setting the output of the identity layer as float i get the following error :

TENSORRT LOG: 4: Output tensor out of type Int32 produced from output of incompatible type Float
TENSORRT LOG: 4: Output tensor out of type Int32 produced from output of incompatible type Float

does this mean int32->fp32 isn't supported by tensorrt 8 ?(but the documentation mentions that such a conversion should be possible , i have attached a screenshot from the website for reference ) Screenshot from 2023-03-03 12-34-36

zerollzeng commented 1 year ago

Why do you add an identity layer after the topk? it's like a cast in this case.

perseusdg commented 1 year ago

the rest of our pipeline is in float , we figured it might be easier to do a cast as a part of the tensorrt engine itself using the identity layer, additionally some of our other networks take in the index output of topk as inputs to further layers in the network.

zerollzeng commented 1 year ago

I don't think adding an Identity layer is correct here, at least we don't add it in our onnx parser, IIUC the precision setting should be done via IBuilderConfig.

perseusdg commented 1 year ago

We do set the precision in the builder config, it's just that the index (output 1 ) of the topk layer is of the type int32 , but the remaining parts of our network is designed to take in either fp32/fp16 as input, instead of making a plugin to do the int32 -> fp32/16 conversion , we figured it might be better to use a tensorrt layer that can do the casting like the identity layer.

zerollzeng commented 1 year ago

@ttyio do you have any suggestion?

ttyio commented 1 year ago

I compared both the parser and based on my understanding there isn't a significant difference between them. tkDNN supports topk across only axis 1 and for k=1. I have attached the parser for your reference .The attached parser works with well with trt7 but not with trt8.

ILayer* NetworkRT::convert_layer(ITensor *input, TopK *l){
    ILayer *lRT = networkRT->addTopK(*input,TopKOperation::kMAX,1,1);
    checkNULL(lRT);
    ITensor *indices = lRT->getOutput(1);
    indices->setType(nvinfer1::DataType::kFLOAT);
    ILayer *iLRT = networkRT->addIdentity(*indices);
    // iLRT->setOutputType(0,nvinfer1::DataType::kFLOAT);
    checkNULL(iLRT);
    return iLRT;

}

@perseusdg , have you tried remove the setType(kFLOAT) for output1? since the second output for ITopK should be int32. Then using IIdentity to convert the int32 to float? pseudo code should like this:

ILayer* NetworkRT::convert_layer(ITensor *input, TopK *l){
    ILayer *lRT = networkRT->addTopK(*input,TopKOperation::kMAX,1,1);
    checkNULL(lRT);
    ITensor *indices = lRT->getOutput(1);
    ILayer *iLRT = networkRT->addIdentity(*indices);
    iLRT->setOutputType(0,nvinfer1::DataType::kFLOAT);
    checkNULL(iLRT);
    return iLRT;
}
perseusdg commented 1 year ago

I did try something like that too(TensorRT 8.4/8.5) and the end result was more or less still wrong.

ttyio commented 1 year ago

@perseusdg what do you mean by more or less still wrong? is the result still constant 0?

perseusdg commented 1 year ago

I said more or less because the end values were very very low but not zero exactly. The issue seems to have gone with 8.6 using ICastLayer

Edit: to clarify what I mean by a really low value, I mean something around +/- 1e-30

ttyio commented 1 year ago

Thanks @perseusdg for updating that ICastLayer in 8.6 fix this issue, closing