Closed FengYue95 closed 2 years ago
Hello @FengYue95 , thanks for reporting. Since it works in native TRT, not sure what's the limitation here in TRTIS, could you create the issue in https://github.com/triton-inference-server/server/issues? thanks!
Hello @FengYue95 , thanks for reporting. Since it works in native TRT, not sure what's the limitation here in TRTIS, could you create the issue in https://github.com/triton-inference-server/server/issues? thanks!
Thank you! In fact that I have reported in triton-server issue too, and they suggest to use the 20.12 version of the NGC tensorrt container. I will try this method later. What I want to know now is that which version of tensorrt is in the NGC tensorrt container 20.12? Is it different from the version that I have used(Tensorrt 7.2.2)? And if I want to use Tensorrt 7.2.2, is there any version of triton-server you have tried that matches?
Hello @FengYue95 We have TensorRT 7.2.2 in 20.12, see https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_20-12.html
And for the triton-server, it is a separate project, current we have no support matrix in TRT side. Could you ask in triton repo? thanks.
@ttyio Hello!Finally I found where the problem is !
As described,I use TensorRT 7.2.2.3 download from NVIDIA Developer Zone and it works. However, when I try to use NGC Tensorrt container 20.12 to infer locally, it fails too. I compare all the environment path and library files and then find out that the version of Tensorrt NGC container used and triton inference server used is TensorRT 7.2.2.1.
Two versions(7.2.2.3 vs 7.2.2.1) are different in file libnvinfer_plugin.so.7.2.2, where the core dumped file exits in. So I replace the file libnvinfer_plugin.so.7.2.2 with the one in TensorRT 7.2.2.3, and the problem was gone.
Therefore, I think the problem is in the libnvinfer_plugin.so.7.2.2 file of Tensorrt 7.2.2.1 where int bert::embSkipLayerNorm is defined. What's the difference between tensorrt 7.2.2.1 and 7.2.2.3? Why this problem occurs?
Hello @FengYue95 , I have checked log Between 7.2.2.0 and 7.2.2.1, we add support for dynamic shape on sequence dimension; there is no changes to BERT plugin between 7.2.2.1 and 7.2.2.3.
@ttyio Hello~ I compare the md5 of two different libnvinfer_plugin.so.7.2.2:
the one in TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz:
the one in nvcr.io/nvidia/tensorrt:20.12-py3 and nvcr.io/nvidia/tritonserver:20.12-py3:
Only use the one in TensorRT-7.2.2.3 can I build bert engine successfully~
@FengYue95
Hmmm, the plugin in nvcr.io/nvidia/tensorrt:20.12-py3
is built from opensource, could you try the 7.2.2 download from https://developer.nvidia.com/nvidia-tensorrt-download? plugin in this package is built from internal repo.
We can ask @rajeevsrao for help if the internal 7.2.2 build works but the opensource one failed. Thanks!
Hello @ttyio @rajeevsrao The package (TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz ) which I have used was exactly download from https://developer.nvidia.com/nvidia-tensorrt-download. It works but the opensource one failed.
@FengYue95 Could you try TRT 8.2/8.4 and see if the issue still exists? If it does, we will debug it. Thanks
Closing due to >14 days without activity. Please feel free to reopen if the issue still exists in TRT 8.4. Thanks
Description
I build a tensorrt engine with bert demo in TensorRT project (master) modified myself and run inference.py in the bert-tensorrt docker successfully. However, when I try to use Triton Inference server to infer this engine, the server exits with core dumped. So I gdb the core file and get messages as follows. Is there any bug in my engine built code? How can I fix this problem?
Environment
TensorRT Version: 7.2.2 (TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.0.cudnn8.0.tar.gz) Triton inference server Version: 20.12 (docker : nvcr.io/nvidia/tritonserver:20.12-py3) GPU Type: Tesla P40 Nvidia Driver Version: 450.57
Relevant Files
builder.py used to build engine:
model configuration file: