Closed ninono12345 closed 9 months ago
Based on my experience, you may need to set all Softmax layers to fp32. python:
c++:
https://torchpipe.github.io/docs/backend-reference/torch?_highlight=tensorrtt#tensorrttensor
The transformer_op17.engine was in fp16 or fp32?
@tp-nan for now I always do fp32, because fp16 returns NaN
My question is because all the models that are run inside predict_cls_bbreg_filters_parallel seem to give out good accuracy, maybe there is some specific function that tensorrt doesn't support very well?
UPDATE. EVERYTHING, ALL ACCURACY ISSUES GOT FIXED WHEN I FIRST TRANSFERED TENSORS TO CPU BEFOR INFERENCING THEM WITH POLYGRAPHY
just an afterthought, onnxsim can be used to reduce the model; For fp16, you may need to make Softmax layer runing in fp32
Description
Hello, I've been working on a project to convert a tracking algorithm to TensorRT. By converting one part of the code to TensorRT runing inference, then another part of the code... I've tracked down that the accuracy issues lie within this code:
`import atexit import tensorrt as trt from polygraphy.backend.trt import EngineFromNetwork, NetworkFromOnnxPath, TrtRunner superfinal_polygraphy = True pol1 = True pol2 = True pol3 = True
def MLP(channels, do_bn=True):
class FilterPredictor(nn.Module):
If you will notice, I converted all the submodules, that predict_cls_bbreg_filters_parallel uses into TensorRT. When setting the values in the beginning of the code to True, all modules run in TensorRT. EVERYTHING IS OK, the accuracy is like it should be
BUT If i convert the entire FilterPredictor module to TensorRT, I get terrible accuracy issues!
I debugged the model with polygraphy, debug reduced with bisect mode and got the onnx model reduced to the part where accuracy fails:
I further reduced the model with linear mode and got this:
the polygraphy specifically reduced the model that inside a transformer, in particular torch.nn.MultiHeadAttention (in onnx graph multihead_attn) is the part where the accuracy fails. This is defined within pytorch. This is not some random custom model, made by somebody, so it is weird that the accuracy would fail here.
CAN ANYBODY EXPLAIN WHAT IS HAPPENING?...
If I convert ONLY the all the SUBMODULES to TensorRT, the ACCURACY IS OK, but if I convert the ENTIRE model the accuracy is BAD...
Perhaps it is the tensor manipulations within predict_cls_bbreg_filters_parallel that cause the accuracy errors?
Thank you for your help
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: GTX 1660 Ti
NVIDIA Driver Version: 546.01
CUDA Version: 12.1 update 1
CUDNN Version: 8.9.7
Operating System:
Python Version (if applicable): 3.10.13
PyTorch Version (if applicable): 2.1.2+cu121
Baremetal or Container (if so, version): No environment, run straight on Windows 10
Relevant Files
link to onnx model: https://drive.google.com/file/d/1-7r0AE_33kJK0KfzkZH-5_l8B8RL-uem/view?usp=sharing link to polygraphy surgeon sanitized onnx model: https://drive.google.com/file/d/1KbRR8dMXrLt_-durnIuS6vuByZHIdFAb/view?usp=sharing link to polygraphy debug reduced model: https://drive.google.com/file/d/16h5zNU6lqBKUYb85Y2bE5-gMQXhNsJ07/view?usp=sharing