Tencent / TPAT

TensorRT Plugin Autogen Tool
Apache License 2.0
365 stars 42 forks source link

Fail to run example test_onehot_dynamic_direct.py #27

Closed hengxinCheung closed 1 year ago

hengxinCheung commented 1 year ago

Description

I tried to run the example test_onehot_dynamic_direct.py, but got a segment fault. And I found this fault occurred in parser.parse(model.read()) (line 268). I would appreciate it if you could help me solve this problem.

Environment

docker-image==nvcr.io/nvidia/tensorflow:20.06-tf1-py3
nvidia-driver==470.82
cuda==11.3
TensorRT==8.2.3

onnx==1.10.0
onnxruntime==1.10.0
onnxruntime-gpu==1.10.0
onnx-graphsurgeon==0.3.26
tf2onnx==1.11.1

Log

[02/28/2023-11:46:17] [TRT] [V] Original shape: (_, 64), unsqueezing to: (_, _, _, _)
[02/28/2023-11:46:17] [TRT] [W] ShapedWeights.cpp:173: Weights dense/kernel/read:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[02/28/2023-11:46:17] [TRT] [V] Registering layer: dense/MatMul for ONNX node: dense/MatMul
[02/28/2023-11:46:17] [TRT] [V] Original shape: (_, 256, 1, 1), squeezing to: (_, _)
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: dense/MatMul:0 for ONNX tensor: dense/MatMul:0
[02/28/2023-11:46:17] [TRT] [V] dense/MatMul [MatMul] outputs: [dense/MatMul:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: Min__6 [Min]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: dense/MatMul:0
[02/28/2023-11:46:17] [TRT] [V] Searching for input: clip_by_value/Minimum/y:0
[02/28/2023-11:46:17] [TRT] [V] Min__6 [Min] inputs: [dense/MatMul:0 -> (-1, 256)[FLOAT]], [clip_by_value/Minimum/y:0 -> ()[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Registering layer: clip_by_value/Minimum/y:0 for ONNX node: clip_by_value/Minimum/y:0
[02/28/2023-11:46:17] [TRT] [V] Registering layer: Min__6 for ONNX node: Min__6
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: Min__6:0 for ONNX tensor: Min__6:0
[02/28/2023-11:46:17] [TRT] [V] Min__6 [Min] outputs: [Min__6:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: Max__9 [Max]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: Min__6:0
[02/28/2023-11:46:17] [TRT] [V] Searching for input: clip_by_value/y:0
[02/28/2023-11:46:17] [TRT] [V] Max__9 [Max] inputs: [Min__6:0 -> (-1, 256)[FLOAT]], [clip_by_value/y:0 -> ()[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Registering layer: clip_by_value/y:0 for ONNX node: clip_by_value/y:0
[02/28/2023-11:46:17] [TRT] [V] Registering layer: Max__9 for ONNX node: Max__9
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: Max__9:0 for ONNX tensor: Max__9:0
[02/28/2023-11:46:17] [TRT] [V] Max__9 [Max] outputs: [Max__9:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: Cast [Cast]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: Max__9:0
[02/28/2023-11:46:17] [TRT] [V] Cast [Cast] inputs: [Max__9:0 -> (-1, 256)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [V] Casting to type: int32
[02/28/2023-11:46:17] [TRT] [V] Registering layer: Cast for ONNX node: Cast
[02/28/2023-11:46:17] [TRT] [V] Registering tensor: Cast:0 for ONNX tensor: Cast:0
[02/28/2023-11:46:17] [TRT] [V] Cast [Cast] outputs: [Cast:0 -> (-1, 256)[INT32]], 
[02/28/2023-11:46:17] [TRT] [V] Parsing node: test_onehot [tpat_test_onehot]
[02/28/2023-11:46:17] [TRT] [V] Searching for input: Cast:0
[02/28/2023-11:46:17] [TRT] [V] Searching for input: const_fold_opt__17
[02/28/2023-11:46:17] [TRT] [V] Searching for input: const_fold_opt__19
[02/28/2023-11:46:17] [TRT] [V] test_onehot [tpat_test_onehot] inputs: [Cast:0 -> (-1, 256)[INT32]], [const_fold_opt__17 -> (1)[INT32]], [const_fold_opt__19 -> (2)[FLOAT]], 
[02/28/2023-11:46:17] [TRT] [I] No importer registered for op: tpat_test_onehot. Attempting to import as plugin.
[02/28/2023-11:46:17] [TRT] [I] Searching for plugin: tpat_test_onehot, plugin_version: 1, plugin_namespace: 
[02/28/2023-11:46:17] [TRT] [V] Registering layer: const_fold_opt__17 for ONNX node: const_fold_opt__17
[02/28/2023-11:46:17] [TRT] [V] Registering layer: const_fold_opt__19 for ONNX node: const_fold_opt__19
[02/28/2023-11:46:17] [TRT] [I] Successfully created plugin: tpat_test_onehot
[02/28/2023-11:46:17] [TRT] [V] Registering layer: test_onehot for ONNX node: test_onehot
Segmentation fault
buptqq commented 1 year ago

It seems that there is something wrong with the Onnx-Parser. Can you try it with DockerFile in TPAT ? Or you can use 'trtexec --plugins' to test this Plugin.

hengxinCheung commented 1 year ago

@buptqq Thanks for your reply, and:

  1. The docker image used is built from the TPAT dockerfile, only cuda and tensorrt have been changed;
  2. use trtexec --plugins got same fault.

Would you give me some other helpful advice?

buptqq commented 1 year ago

Based on past experience, Segmentation fault usually caused by different tensorrt versions(build Plugin and use Plugin). So please check the 'TRT_LIB_PATH' in 'python/trt_plugin/Makefile', Make sure your plugin is compiled with TensorRT-8.2.3

@buptqq Thanks for your reply, and:

  1. The docker image used is built from the TPAT dockerfile, only cuda and tensorrt have been changed;
  2. use trtexec --plugins got same fault.

Would you give me some other helpful advice?

hengxinCheung commented 1 year ago

@buptqq I did run the example step by step as described in the repo, so I did set the TRT_LIB_PATH pointing to TensorRT-8.2.3. Could you provide a docker image with the above environment that can run example successfully.

buptqq commented 1 year ago

@buptqq I did run the example step by step as described in the repo, so I did set the TRT_LIB_PATH pointing to TensorRT-8.2.3. Could you provide a docker image with the above environment that can run example successfully.

wait a minute i will let my colleague provide this docker image which with TensorRT-8.2.3. @wenqf11

wenqf11 commented 1 year ago

@buptqq I did run the example step by step as described in the repo, so I did set the TRT_LIB_PATH pointing to TensorRT-8.2.3. Could you provide a docker image with the above environment that can run example successfully.

@hengxinCheung You can use nvcr.io/nvidia/tensorflow:22.03-tf1-py3 in Dockerfile(which is TensorRT 8.2.3, from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt/tags) and build a docker for TPAT. When you run example in new Docker, change the following config in python/trt_plugin/Makefile

CUDA_PATH   = /usr/local/cuda/  
TRT_LIB_PATH = /usr/lib/x86_64-linux-gnu
hengxinCheung commented 1 year ago

@buptqq I did run the example step by step as described in the repo, so I did set the TRT_LIB_PATH pointing to TensorRT-8.2.3. Could you provide a docker image with the above environment that can run example successfully.

@hengxinCheung You can use nvcr.io/nvidia/tensorflow:22.03-tf1-py3 in Dockerfile(which is TensorRT 8.2.3, from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt/tags) and build a docker for TPAT. When you run example in new Docker, change the following config in python/trt_plugin/Makefile

CUDA_PATH   = /usr/local/cuda/  
TRT_LIB_PATH = /usr/lib/x86_64-linux-gnu

I rebuild the docker image and run the example with your advice, but got the following error (my device is GeForce-RTX-3090.):

  4: tvm::build(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
  3: tvm::codegen::Build(tvm::IRModule, tvm::Target)
  2: tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, tvm::Target)>(tvm::runtime::Module (*)(tvm::IRModule, tvm::Target), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  1: tvm::codegen::BuildCUDA(tvm::IRModule, tvm::Target)
  0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/autotvm/measure/measure_methods.py", line 789, in tvm_callback_cuda_compile
    ptx = nvcc.compile_cuda(code, target=target, arch=AutotvmGlobalScope.current.cuda_target_arch)
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/contrib/nvcc.py", line 108, in compile_cuda
    raise RuntimeError(msg)
RuntimeError: 
#ifdef _WIN32
  using uint = unsigned int;
  using uchar = unsigned char;
  using ushort = unsigned short;
  using int64_t = long long;
  using uint64_t = unsigned long long;
#else
  #define uint unsigned int
  #define uchar unsigned char
  #define ushort unsigned short
  #define int64_t long long
  #define uint64_t unsigned long long
#endif
extern "C" __global__ void __launch_bounds__(1024) tvmgen_default_fused_one_hot_kernel0(float* __restrict__ T_one_hot, int* __restrict__ placeholder, float* __restrict__ placeholder1, float* __restrict__ placeholder2) {
  T_one_hot[(((((int)blockIdx.x) * 1024) + ((int)threadIdx.x)))] = ((placeholder[(((((int)blockIdx.x) * 4) + (((int)threadIdx.x) >> 8)))] == (((int)threadIdx.x) & 255)) ? placeholder1[(0)] : placeholder2[(0)]);
}

Compilation error:
nvcc fatal   : Value 'sm_86' is not defined for option 'gpu-architecture'

And I found that the version of TensorRT in the docker container maybe is 7, not 8:

# pip list | grep tensorrt
tensorrt                  7.0.0.11

# ls /usr/lib/x86_64-linux-gnu/ | grep nvinfer
libnvinfer.so
libnvinfer.so.7
libnvinfer.so.7.0.0
libnvinfer_plugin.so
libnvinfer_plugin.so.7
libnvinfer_plugin.so.7.0.0
wenqf11 commented 1 year ago

@hengxinCheung You can check trt in nvcr.io/nvidia/tensorflow:22.03-tf1-py3, must be tensorrt 8.2.3

hengxinCheung commented 1 year ago

@hengxinCheung You can check trt in nvcr.io/nvidia/tensorflow:22.03-tf1-py3, must be tensorrt 8.2.3

I double check the image, and I still think that the version of TensorRT is 7. Maybe the docker image you suggested is nvcr.io/nvidia/tensorrt:xx, not nvcr.io/nvidia/tensorflow:xx. For the above nvcc fatal, I try to add the following code in python/cuda_kernel.py, but also can not build the TensorRT engine sucessfully:

from tvm.autotvm.measure.measure_methods import set_cuda_target_arch
# the value 'sm_75' appear in file `python/trt_plugin/Makefile` (line 75)
set_cuda_target_arch('sm_75')

I will close this issue, and prepare to implement OneHot plugin by myself. Thanks for all the replies and best wishes. And I found some mistake in this example (maybe):

# inconsistent batch_size before and after
line 230:         input_model_file, output_model_file, node_names=node_names, dynamic_bs=dynamic, min_bs=1, max_bs=256, opt_bs=128
line 243: builder.max_batch_size = 1024
line 251:                 profile.set_shape(input.name, [1] + shape_without_batch, [256] + shape_without_batch, [256] + shape_without_batch )
wenqf11 commented 1 year ago

@hengxinCheung Please make sure you are using 22.03-tf1-py3, not 20.03-tf1-py3

docker pull nvcr.io/nvidia/tensorflow:22.03-tf1-py3
git clone https://github.com/Tencent/TPAT
cd TPAT/
vim Dockerfile (modified 20.03 to 22.03) 
docker build -f Dockerfile -t tensorflow-tpat:22.03-tf1-py3  .
nvidia-docker run -it --rm -v your_tpat_dir:tpat_dir_in_container --network=host tensorflow-tpat:22.03-tf1-py3 bash
hengxinCheung commented 1 year ago

@hengxinCheung Please make sure you are using 22.03-tf1-py3, not 20.03-tf1-py3

docker pull nvcr.io/nvidia/tensorflow:22.03-tf1-py3
git clone https://github.com/Tencent/TPAT
cd TPAT/
vim Dockerfile (modified 20.03 to 22.03) 
docker build -f Dockerfile -t tensorflow-tpat:22.03-tf1-py3  .
nvidia-docker run -it --rm -v your_tpat_dir:tpat_dir_in_container --network=host tensorflow-tpat:22.03-tf1-py3 bash

@wenqf11 Thanks for your helpful advice. I did write the tag of the image wrong, and I run the example successfully with the suitable image. It looks like that previous errors were all due to version mismatch (driver && cuda && tensorrt). But I got big result difference when I build the TensorRT engine for my model (a bert model) with plugins generating by TPAT. I am trying to solve this problem.

xwz-ol commented 5 months ago

请确保您使用的是 22.03-tf1-py3,而不是 20.03-tf1-py3

docker pull nvcr.io/nvidia/tensorflow:22.03-tf1-py3
git clone https://github.com/Tencent/TPAT
cd TPAT/
vim Dockerfile (modified 20.03 to 22.03) 
docker build -f Dockerfile -t tensorflow-tpat:22.03-tf1-py3  .
nvidia-docker run -it --rm -v your_tpat_dir:tpat_dir_in_container --network=host tensorflow-tpat:22.03-tf1-py3 bash

感谢您的有用建议。我确实写错了图像的标签,并且我使用合适的图像成功运行了示例。看起来以前的错误都是由于版本不匹配(驱动程序 && cuda & 张量)造成的。但是当我使用TPAT生成的插件为我的模型(bert模型)构建TensorRT引擎时,我得到了很大的结果差异。我正在尝试解决这个问题。

@hengxinCheung 您好 您是仅仅通过修改镜像就可以运行test_onehot_dynamic_direct.py示例吗? 我使用了上述的解决方案但是运行该示例仍然遇到了 Value 'sm_89' is not defined for option 'gpu-architecture'