Closed perseusdg closed 1 year ago
Does the final engine is built from onnx? If yes can you share the onnx here?
No the engine isnt built using onnx, its built using a custom parser(tkDNN), i linked the repo in the issue and also mentioned the commit sha , to build the engine download the weights through the link provided and rename the folder as pidnet_s and place it in the build directory , and build test_pidnet_s
using cmake and run it , to visualize it you should build seg_demo
and run it using this command ./seg_demo pid_640_fp32.rt
I would highly suspect that this is due to some usage issue, perhaps you can take a look at our onnx parser and compare with your parser. see https://github.com/onnx/onnx-tensorrt/blob/fd119fec8565264add819f8edc801066116a32dd/builtin_op_importers.cpp#L4892
Okay , I will take a look at it and check it against tkdnn's parser
I compared both the parser and based on my understanding there isn't a significant difference between them. tkDNN supports topk across only axis 1 and for k=1. I have attached the parser for your reference .The attached parser works with well with trt7 but not with trt8.
ILayer* NetworkRT::convert_layer(ITensor *input, TopK *l){
ILayer *lRT = networkRT->addTopK(*input,TopKOperation::kMAX,1,1);
checkNULL(lRT);
ITensor *indices = lRT->getOutput(1);
indices->setType(nvinfer1::DataType::kFLOAT);
ILayer *iLRT = networkRT->addIdentity(*indices);
// iLRT->setOutputType(0,nvinfer1::DataType::kFLOAT);
checkNULL(iLRT);
return iLRT;
}
Could you provide a reproduce sample for this? Maybe just extract the topk convert from tkDNN and add standalone test for it.
Yeah sure, i will add the topk_test and link the commit here in a couple of days
The commit is available here https://github.com/perseusdg/tkDNN/tree/dev_bug and the executable is called test_topk ,the weights for the topk test are here.
We found this same issue occuring in the IElementwiseLayer especially when we use ops like equal,less,greater,or,and or ISelectLayer where there is some form of conversion from float to bool/int32or vice-versa , it gives wrong results on TensorRT8 but with TensorRT7 the results are correct. I also have attached an example of how we convert it using the IIdentityLayer from tensorrt :
if(l->op_mode == tk::dnn::OP_AND || l->op_mode == tk::dnn::OP_OR){
IIdentityLayer *l1RT = networkRT->addIdentity(*l1);
l1RT->setOutputType(0,DataType::kBOOL);
IIdentityLayer *l2RT = networkRT->addIdentity(*l2);
l2RT->setOutputType(0,DataType::kBOOL);
IElementWiseLayer *lRT = networkRT->addElementWise(*l1RT->getOutput(0),*l2RT->getOutput(0),static_cast<ElementWiseOperation>(l->op_mode));
IIdentityLayer* olRT = networkRT->addIdentity(*lRT->getOutput(0));
olRT->setOutputType(0,DataType::kFLOAT);
checkNULL(olRT);
return olRT;
}
On going through the build logs, despite setting the indices
type for the topk layer as fp32 i did notice that output was still int32 and the identity layer did nothing , if i try setting the output of the identity layer as float i get the following error :
TENSORRT LOG: 4: Output tensor out of type Int32 produced from output of incompatible type Float
TENSORRT LOG: 4: Output tensor out of type Int32 produced from output of incompatible type Float
does this mean int32->fp32 isn't supported by tensorrt 8 ?(but the documentation mentions that such a conversion should be possible , i have attached a screenshot from the website for reference )
Why do you add an identity layer after the topk? it's like a cast in this case.
the rest of our pipeline is in float , we figured it might be easier to do a cast as a part of the tensorrt engine itself using the identity layer, additionally some of our other networks take in the index output of topk as inputs to further layers in the network.
I don't think adding an Identity layer is correct here, at least we don't add it in our onnx parser, IIUC the precision setting should be done via IBuilderConfig.
We do set the precision in the builder config, it's just that the index (output 1 ) of the topk layer is of the type int32 , but the remaining parts of our network is designed to take in either fp32/fp16 as input, instead of making a plugin to do the int32 -> fp32/16 conversion , we figured it might be better to use a tensorrt layer that can do the casting like the identity layer.
@ttyio do you have any suggestion?
I compared both the parser and based on my understanding there isn't a significant difference between them. tkDNN supports topk across only axis 1 and for k=1. I have attached the parser for your reference .The attached parser works with well with trt7 but not with trt8.
ILayer* NetworkRT::convert_layer(ITensor *input, TopK *l){ ILayer *lRT = networkRT->addTopK(*input,TopKOperation::kMAX,1,1); checkNULL(lRT); ITensor *indices = lRT->getOutput(1); indices->setType(nvinfer1::DataType::kFLOAT); ILayer *iLRT = networkRT->addIdentity(*indices); // iLRT->setOutputType(0,nvinfer1::DataType::kFLOAT); checkNULL(iLRT); return iLRT; }
@perseusdg , have you tried remove the setType(kFLOAT)
for output1
? since the second output for ITopK
should be int32. Then using IIdentity
to convert the int32
to float
? pseudo code should like this:
ILayer* NetworkRT::convert_layer(ITensor *input, TopK *l){
ILayer *lRT = networkRT->addTopK(*input,TopKOperation::kMAX,1,1);
checkNULL(lRT);
ITensor *indices = lRT->getOutput(1);
ILayer *iLRT = networkRT->addIdentity(*indices);
iLRT->setOutputType(0,nvinfer1::DataType::kFLOAT);
checkNULL(iLRT);
return iLRT;
}
I did try something like that too(TensorRT 8.4/8.5) and the end result was more or less still wrong.
@perseusdg what do you mean by more or less still wrong
? is the result still constant 0?
I said more or less because the end values were very very low but not zero exactly. The issue seems to have gone with 8.6 using ICastLayer
Edit: to clarify what I mean by a really low value, I mean something around +/- 1e-30
Thanks @perseusdg for updating that ICastLayer
in 8.6 fix this issue, closing
Description
When I use the topk layer with a semantic segmentation network converted from pytorch to tensorrt (pidnet) using tkdnn , it works properly with tensorrt 7 , but the same code when used with tensorrt 8 produces a constant 0 output, is this a bug with tensorrt or was the topk layer updated significantly from tensorrt 7 to 8 ,requiring me to change the code for tensorrt8 ?
Environment
TensorRT Version: 7.2.3/8.5 NVIDIA GPU: RTX 3060 Laptop NVIDIA Driver Version: 525.78.01 CUDA Version: 11.1/11.4 CUDNN Version: 8.1/8.6 Operating System: Ubuntu 20.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
To reproduce the results clone my fork of the tkdnn repo at this commit
91a4a3feb756c77480a0a473745e42216c8ec576
and buildtest_pidnet_s
. The weights for building the network can be found here