Tencent / TPAT

TensorRT Plugin Autogen Tool
Apache License 2.0
365 stars 42 forks source link

so build succeed, tensorrt run error one hot example #38

Open willdla opened 11 months ago

willdla commented 11 months ago

running in the docker created by docker file

get:

[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin tpat_test_onehot version 1
In node -1 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
[ERROR] engine is None

seems plugin not load properly.

how to fix it.

see full log below

root@0390133f0efa:~/examples# python test_onehot_dynamic_direct.py
2023-10-08 08:49:03.568139: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-10-08 08:49:05.214999: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2023-10-08 08:49:05.215383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x678b5d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:05.215399: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-10-08 08:49:05.216549: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2023-10-08 08:49:05.273763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.273974: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x678d2f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:05.273991: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 2070, Compute Capability 7.5
2023-10-08 08:49:05.274107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.274219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:05.274242: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:05.274250: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:05.274276: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:05.274284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:05.276174: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:05.276622: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:05.276637: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:05.276687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.276832: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.276916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:05.276937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:05.522333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:05.522360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:05.522366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:05.522522: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.522698: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:05.522807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7031 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-08 08:49:05.554951: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:06.211052: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf2onnx/verbose_logging.py:76: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2023-10-08 08:49:07,330 - WARNING - tensorflow: From /usr/local/lib/python3.6/dist-packages/tf2onnx/verbose_logging.py:76: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2023-10-08 08:49:07.331626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2023-10-08 08:49:07.357287: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.357450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.357467: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.359028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.359702: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.359943: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.361533: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.361962: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.362148: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.362249: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.362408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.362507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.391000: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2023-10-08 08:49:07.391315: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47e4fe0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:07.391330: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-10-08 08:49:07.439144: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439349: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x48208d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-08 08:49:07.439364: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 2070, Compute Capability 7.5
2023-10-08 08:49:07.439512: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.439641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.439660: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.439672: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.439683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.439703: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.439715: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.439726: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.439772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439892: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.439976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.440000: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.684408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:07.684438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:07.684444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:07.684700: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.684903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.685038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6648 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf2onnx/tf_loader.py:343: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2023-10-08 08:49:07,685 - WARNING - tensorflow: From /usr/local/lib/python3.6/dist-packages/tf2onnx/tf_loader.py:343: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

INFO:tensorflow:Froze 0 variables.
2023-10-08 08:49:07,689 - INFO - tensorflow: Froze 0 variables.
INFO:tensorflow:Converted 0 variables to const ops.
2023-10-08 08:49:07,690 - INFO - tensorflow: Converted 0 variables to const ops.
2023-10-08 08:49:07.690924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.691111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.691127: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.691137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.691147: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.691169: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.691179: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.691190: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.691236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.691471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:07.691477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:07.691482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:07.691540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691668: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.691763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6648 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-08 08:49:07.692623: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.692728: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2023-10-08 08:49:07.692805: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2023-10-08 08:49:07.693102: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2023-10-08 08:49:07.693209: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2023-10-08 08:49:07.693221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2023-10-08 08:49:07.693230: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2023-10-08 08:49:07.693240: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2023-10-08 08:49:07.693250: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2023-10-08 08:49:07.693260: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2023-10-08 08:49:07.693276: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2023-10-08 08:49:07.693341: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693477: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2023-10-08 08:49:07.693579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-08 08:49:07.693585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0
2023-10-08 08:49:07.693590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N
2023-10-08 08:49:07.693649: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693774: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-08 08:49:07.693868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6648 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-08 08:49:07.696080: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: graph_to_optimize
2023-10-08 08:49:07.696093: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   constant_folding: Graph size after: 15 nodes (-2), 14 edges (-2), time = 0.933ms.
2023-10-08 08:49:07.696097: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   function_optimizer: function_optimizer did nothing. time = 0.009ms.
2023-10-08 08:49:07.696101: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   constant_folding: Graph size after: 15 nodes (0), 14 edges (0), time = 0.241ms.
2023-10-08 08:49:07.696104: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824]   function_optimizer: function_optimizer did nothing. time = 0.007ms.
2023-10-08 08:49:07,696 - INFO - tf2onnx: inputs: ['input:0']
2023-10-08 08:49:07,696 - INFO - tf2onnx: outputs: ['output:0']
2023-10-08 08:49:07,698 - INFO - tf2onnx.tfonnx: Using tensorflow=1.15.2, onnx=1.10.0, tf2onnx=1.11.1/1915fb
2023-10-08 08:49:07,699 - INFO - tf2onnx.tfonnx: Using opset <onnx, 11>
2023-10-08 08:49:07,708 - INFO - tf2onnx.tf_utils: Computed 0 values for constant folding
2023-10-08 08:49:07,717 - VERBOSE - tf2onnx.tfonnx: Mapping TF node to ONNX node(s)
2023-10-08 08:49:07,719 - VERBOSE - tf2onnx.tfonnx: Summay Stats:
    tensorflow ops: Counter({'Const': 7, 'Identity': 3, 'Placeholder': 1, 'MatMul': 1, 'Minimum': 1, 'Maximum': 1, 'Cast': 1, 'OneHot': 1})
    tensorflow attr: Counter({'dtype': 8, 'value': 7, 'shape': 1, 'transpose_a': 1, 'transpose_b': 1, 'Truncate': 1, 'to': 1, 'axis': 1})
    onnx mapped: Counter({'Const': 6, 'Identity': 2, 'Placeholder': 1, 'MatMul': 1, 'Minimum': 1, 'Maximum': 1, 'Cast': 1, 'OneHot': 1})
    onnx unmapped: Counter()
2023-10-08 08:49:07,719 - INFO - tf2onnx.optimizer: Optimizing ONNX model
2023-10-08 08:49:07,719 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2023-10-08 08:49:07,722 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: no change
2023-10-08 08:49:07,722 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
2023-10-08 08:49:07,724 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
2023-10-08 08:49:07,724 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2023-10-08 08:49:07,726 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: Concat -1 (1->0), Const -1 (6->5), Unsqueeze -3 (3->0)
2023-10-08 08:49:07,726 - VERBOSE - tf2onnx.optimizer: Apply const_dequantize_optimizer
2023-10-08 08:49:07,727 - VERBOSE - tf2onnx.optimizer.ConstDequantizeOptimizer: no change
2023-10-08 08:49:07,727 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
2023-10-08 08:49:07,729 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
2023-10-08 08:49:07,729 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2023-10-08 08:49:07,730 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: no change
2023-10-08 08:49:07,730 - VERBOSE - tf2onnx.optimizer: Apply reshape_optimizer
2023-10-08 08:49:07,731 - VERBOSE - tf2onnx.optimizer.ReshapeOptimizer: no change
2023-10-08 08:49:07,731 - VERBOSE - tf2onnx.optimizer: Apply global_pool_optimizer
2023-10-08 08:49:07,733 - VERBOSE - tf2onnx.optimizer.GlobalPoolOptimizer: no change
2023-10-08 08:49:07,733 - VERBOSE - tf2onnx.optimizer: Apply q_dq_optimizer
2023-10-08 08:49:07,734 - VERBOSE - tf2onnx.optimizer.QDQOptimizer: no change
2023-10-08 08:49:07,734 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2023-10-08 08:49:07,736 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: Identity -5 (5->0)
2023-10-08 08:49:07,736 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
2023-10-08 08:49:07,737 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
2023-10-08 08:49:07,737 - VERBOSE - tf2onnx.optimizer: Apply einsum_optimizer
2023-10-08 08:49:07,738 - VERBOSE - tf2onnx.optimizer.EinsumOptimizer: no change
2023-10-08 08:49:07,738 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2023-10-08 08:49:07,739 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: no change
2023-10-08 08:49:07,739 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
2023-10-08 08:49:07,740 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
2023-10-08 08:49:07,740 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2023-10-08 08:49:07,741 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: no change
2023-10-08 08:49:07,741 - VERBOSE - tf2onnx.optimizer: Apply const_dequantize_optimizer
2023-10-08 08:49:07,742 - VERBOSE - tf2onnx.optimizer.ConstDequantizeOptimizer: no change
2023-10-08 08:49:07,742 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
2023-10-08 08:49:07,743 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
2023-10-08 08:49:07,743 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2023-10-08 08:49:07,744 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: no change
2023-10-08 08:49:07,744 - VERBOSE - tf2onnx.optimizer: Apply reshape_optimizer
2023-10-08 08:49:07,745 - VERBOSE - tf2onnx.optimizer.ReshapeOptimizer: no change
2023-10-08 08:49:07,745 - VERBOSE - tf2onnx.optimizer: Apply global_pool_optimizer
2023-10-08 08:49:07,746 - VERBOSE - tf2onnx.optimizer.GlobalPoolOptimizer: no change
2023-10-08 08:49:07,746 - VERBOSE - tf2onnx.optimizer: Apply q_dq_optimizer
2023-10-08 08:49:07,747 - VERBOSE - tf2onnx.optimizer.QDQOptimizer: no change
2023-10-08 08:49:07,747 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2023-10-08 08:49:07,748 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: no change
2023-10-08 08:49:07,748 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
2023-10-08 08:49:07,749 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
2023-10-08 08:49:07,749 - VERBOSE - tf2onnx.optimizer: Apply einsum_optimizer
2023-10-08 08:49:07,750 - VERBOSE - tf2onnx.optimizer.EinsumOptimizer: no change
2023-10-08 08:49:07,751 - INFO - tf2onnx.optimizer: After optimization: Concat -1 (1->0), Const -1 (6->5), Identity -5 (5->0), Unsqueeze -3 (3->0)
2023-10-08 08:49:07,752 - INFO - tf2onnx:
2023-10-08 08:49:07,752 - INFO - tf2onnx: Successfully converted TensorFlow model model/test_op_test_onehot.pb to ONNX
2023-10-08 08:49:07,752 - INFO - tf2onnx: Model inputs: ['input:0']
2023-10-08 08:49:07,752 - INFO - tf2onnx: Model outputs: ['output:0']
2023-10-08 08:49:07,752 - INFO - tf2onnx: ONNX model is saved at model/test_op_plugin.onnx
const_input:  Constant (const_fold_opt__18): (shape=(1,), dtype=int32)
values:  [256]
const_input:  Constant (const_fold_opt__19): (shape=(2,), dtype=float32)
values:  [0. 1.]
[08:49:08] /workspace/TPAT/3rdparty/blazerml-tvm/src/tir/transforms/loop_partition.cc:590: Warning: Cannot prove: ((((floordiv(((any_dim*256) + 511), 512) - 1) - floordiv(any_dim, 2)) + 1) >= 0), when generating the post doubt loop
Compile...
/tmp/tuning.log does not exist!

Running...
Compile...
/tmp/tuning.log does not exist!

Running...
Compile...
/tmp/tuning.log does not exist!

Running...
rm -rf ./lib/tpat_test_onehot.so ./obj/*
if [ ! -d ./obj ]; then mkdir -p ./obj; fi
/usr/local/cuda-11.0//bin/nvcc -w -std=c++11 -M -MT tpat_test_onehot.o -I. -I/usr/local/cuda-11.0//samples/common/inc -I/usr/local/cuda-11.0//include -I/usr/include/x86_64-linux-gnu -I/usr/include/x86_64-linux-gnu -I/usr/include -o tpat_test_onehot.d src/tpat_test_onehot.cu
/usr/local/cuda-11.0//bin/nvcc -w -std=c++11 -I. -I/usr/local/cuda-11.0//samples/common/inc -I/usr/local/cuda-11.0//include -I/usr/include/x86_64-linux-gnu -I/usr/include/x86_64-linux-gnu -I/usr/include -Xcompiler -fPIC -arch=sm_75 -o tpat_test_onehot.o -c src/tpat_test_onehot.cu
# /usr/local/cuda-11.0//bin/nvcc -w -std=c++11 -I. -I/usr/local/cuda-11.0//samples/common/inc -I/usr/local/cuda-11.0//include -I/usr/include/x86_64-linux-gnu -I/usr/include/x86_64-linux-gnu -I/usr/include -Xcompiler -fPIC -arch=sm_75 -G -lineinfo -o tpat_test_onehot.o -c src/tpat_test_onehot.cu
g++ -w -std=c++11 -shared -o tpat_test_onehot.so tpat_test_onehot.o -L/usr/local/cuda-11.0//lib64 -L/usr/local/cuda-11.0//lib64 -L/workspace/TensorRT-8.0.3.4/lib  -lnvinfer -lcudart -lcuda -Wl,-rpath=/usr/local/cuda-11.0//lib64 -Wl,-rpath=/usr/local/cuda-11.0//lib64 -Wl,-rpath=/workspace/TensorRT-8.0.3.4/lib
if [ ! -d  ./lib ]; then mkdir -p ./lib; fi
mv *.o   ./obj/
mv *.d   ./obj/
mv *.so ./lib/
Onnx_name_mapping_trt_plugin: {'test_onehot': 'tpat_test_onehot'}
load ./trt_plugin/lib/tpat_test_onehot
[TensorRT] VERBOSE: Registered plugin creator - ::GridAnchor_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::NMS_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Reorg_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Region_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Clip_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::LReLU_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::PriorBox_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Normalize_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::RPROI_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::BatchedNMS_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::FlattenConcat_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::CropAndResize version 1
[TensorRT] VERBOSE: Registered plugin creator - ::DetectionLayer_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Proposal version 1
[TensorRT] VERBOSE: Registered plugin creator - ::ProposalLayer_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::ResizeNearest_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::Split version 1
[TensorRT] VERBOSE: Registered plugin creator - ::SpecialSlice_TRT version 1
[TensorRT] VERBOSE: Registered plugin creator - ::InstanceNormalization_TRT version 1
[TensorRT] VERBOSE: ModelImporter.cpp:202: Adding network input: input:0 with dtype: float32, dimensions: (-1, 64)
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: input:0 for ONNX tensor: input:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: dense/kernel/read:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: clip_by_value/Minimum/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: clip_by_value/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: const_fold_opt__18
[TensorRT] VERBOSE: ModelImporter.cpp:90: Importing initializer: const_fold_opt__19
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: dense/MatMul [MatMul]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: input:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: dense/kernel/read:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: dense/MatMul [MatMul] inputs: [input:0 -> (-1, 64)], [dense/kernel/read:0 -> (64, 256)],
[TensorRT] VERBOSE: builtin_op_importers.cpp:2053: GEMM: using FC layer instead of MM because all criteria were met.
[TensorRT] WARNING: onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] VERBOSE: onnx2trt_utils.cpp:1793: Original shape: (_, 64), unsqueezing to: (_, _, _, _)
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: dense/MatMul for ONNX node: dense/MatMul
[TensorRT] VERBOSE: onnx2trt_utils.cpp:1641: Original shape: (_, 256, 1, 1), squeezing to: (_, _)
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: dense/MatMul:0 for ONNX tensor: dense/MatMul:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: dense/MatMul [MatMul] outputs: [dense/MatMul:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: Min__6 [Min]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: dense/MatMul:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: clip_by_value/Minimum/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: Min__6 [Min] inputs: [dense/MatMul:0 -> (-1, -1)], [clip_by_value/Minimum/y:0 -> ()],
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: Min__6 for ONNX node: Min__6
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: Min__6:0 for ONNX tensor: Min__6:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: Min__6 [Min] outputs: [Min__6:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: Max__9 [Max]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: Min__6:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: clip_by_value/y:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: Max__9 [Max] inputs: [Min__6:0 -> (-1, -1)], [clip_by_value/y:0 -> ()],
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: Max__9 for ONNX node: Max__9
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: Max__9:0 for ONNX tensor: Max__9:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: Max__9 [Max] outputs: [Max__9:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: Cast [Cast]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: Max__9:0
[TensorRT] VERBOSE: ModelImporter.cpp:125: Cast [Cast] inputs: [Max__9:0 -> (-1, -1)],
[TensorRT] VERBOSE: builtin_op_importers.cpp:320: Casting to type: int32
[TensorRT] VERBOSE: ImporterContext.hpp:141: Registering layer: Cast for ONNX node: Cast
[TensorRT] VERBOSE: ImporterContext.hpp:116: Registering tensor: Cast:0 for ONNX tensor: Cast:0
[TensorRT] VERBOSE: ModelImporter.cpp:179: Cast [Cast] outputs: [Cast:0 -> (-1, -1)],
[TensorRT] VERBOSE: ModelImporter.cpp:103: Parsing node: test_onehot [tpat_test_onehot]
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: Cast:0
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: const_fold_opt__18
[TensorRT] VERBOSE: ModelImporter.cpp:119: Searching for input: const_fold_opt__19
[TensorRT] VERBOSE: ModelImporter.cpp:125: test_onehot [tpat_test_onehot] inputs: [Cast:0 -> (-1, -1)], [const_fold_opt__18 -> (1)], [const_fold_opt__19 -> (2)],
[TensorRT] INFO: ModelImporter.cpp:135: No importer registered for op: tpat_test_onehot. Attempting to import as plugin.
[TensorRT] INFO: builtin_op_importers.cpp:3659: Searching for plugin: tpat_test_onehot, plugin_version: 1, plugin_namespace:
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin tpat_test_onehot version 1
In node -1 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
[ERROR] engine is None
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
Aborted (core dumped)
root@0390133f0efa:~/examples#
willdla commented 11 months ago

also, to run tpat in my docker i change some code.

diff --git a/Dockerfile b/Dockerfile
index 243698e..f308fd9 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM nvcr.io/nvidia/tensorflow:20.06-tf1-py3
+FROM nvcr.nju.edu.cn/nvidia/tensorflow:20.06-tf1-py3
 RUN wget -O "llvm-9.0.1.src.tar.xz" https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/llvm-9.0.1.src.tar.xz \
     && tar -xvf llvm-9.0.1.src.tar.xz && mkdir llvm-9.0.1.src/build \
     && cd llvm-9.0.1.src/build && cmake -G "Unix Makefiles" -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX="/usr/local/llvm" .. && make -j8 && make install PREFIX="/usr/local/llvm"
diff --git a/examples/test_onehot_dynamic_direct.py b/examples/test_onehot_dynamic_direct.py
index 881fc0e..fd564e8 100644
--- a/examples/test_onehot_dynamic_direct.py
+++ b/examples/test_onehot_dynamic_direct.py
@@ -233,6 +233,7 @@ def main():
     #trt_plugin_names = ['tpat_test_onehot_dynamic']
     for trt_plugin_name in trt_plugin_names:
         assert os.path.isfile(f"./trt_plugin/lib/{trt_plugin_name}.so")
+        print("load ./trt_plugin/lib/{}".format(trt_plugin_name))
         ctypes.cdll.LoadLibrary("./trt_plugin/lib/{}.so".format(trt_plugin_name))

     # build trt model by onnx model
diff --git a/python/onnx_modified.py b/python/onnx_modified.py
index 842808f..b55d8a2 100644
--- a/python/onnx_modified.py
+++ b/python/onnx_modified.py
@@ -122,7 +122,7 @@ class OnnxModified(object):
             ][0]
             for inp in inferred_tuning_node.inputs:
                 if inp.__class__ == gs.Constant:
-                    onnx_original_tensor_type[inp.name] = inp.dtype.__name__
+                    onnx_original_tensor_type[inp.name] = inp.dtype.name
                 elif not inp.is_empty():
                     onnx_original_tensor_type[inp.name] = inp.dtype.name
             [
diff --git a/python/plugin_template_params.py b/python/plugin_template_params.py
index a7ba4dd..933b86d 100644
--- a/python/plugin_template_params.py
+++ b/python/plugin_template_params.py
@@ -212,9 +212,9 @@ class PluginTemplateParams(object):
         tuning_node = tuning_nodes[0]
         for inp in tuning_node.inputs:
             if inp.__class__ == gs.Constant:
-                self._onnx_input_python_type.append(tvm_to_c_type_mapping[inp.dtype.__name__])
+                self._onnx_input_python_type.append(tvm_to_c_type_mapping[inp.dtype.name])
                 self._onnx_tensor_type.append(
-                    python_to_trt_type_mapping[inp.dtype.__name__]
+                    python_to_trt_type_mapping[inp.dtype.name]
                 )
             elif not inp.is_empty():
                 self._onnx_input_python_type.append(tvm_to_c_type_mapping[inp.dtype.name])
diff --git a/python/trt_plugin/Makefile b/python/trt_plugin/Makefile
index ead82b4..b9a9591 100644
--- a/python/trt_plugin/Makefile
+++ b/python/trt_plugin/Makefile
@@ -16,7 +16,7 @@

 CUDA_PATH   = /usr/local/cuda-11.0/
 #TRT_LIB_PATH = /root/workspace/download/TensorRT-7.2.2.3/lib
-TRT_LIB_PATH = /root/workspace/download/ft_local/TensorRT-8.0.0.3/lib
+TRT_LIB_PATH = /workspace/TensorRT-8.0.3.4/lib^M
willdla commented 11 months ago

tpat_files.zip

files created by my env.

image
willdla commented 11 months ago

Is there any update

willdla commented 11 months ago

@buptqq

wenqf11 commented 11 months ago

@willdla It looks like that your program doesn't load tpat_test_onehot.so plugin correctly. Please make sure you have tpat_test_onehost.so in dir python/trt_plugin/lib/ and ctypes.cdll.LoadLibrary("./trt_plugin/lib/{}.so".format(trt_plugin_name)) loads it.

willdla commented 11 months ago

@willdla It looks like that your program doesn't load tpat_test_onehot.so plugin correctly. Please make sure you have tpat_test_onehost.so in dir python/trt_plugin/lib/ and ctypes.cdll.LoadLibrary("./trt_plugin/lib/{}.so".format(trt_plugin_name)) loads it.

so在对应的位置:

python/trt_plugin/lib/
└── tpat_test_onehot.so

默认代码是有这个语句的 我这边也没做任何修改 ctypes.cdll.LoadLibrary("./trt_plugin/lib/{}.so".format(trt_plugin_name))

willdla commented 11 months ago
image
wenqf11 commented 11 months ago

@willdla test_onehot_dynamic_direct.py is in example directory. maybe you shold exec python test_onehot_dynamic_direct.py in $ROOT/python/ directory or change ctypes.cdll.LoadLibrary("./trt_plugin/lib/{}.so".format(trt_plugin_name)) to ctypes.cdll.LoadLibrary("../python/trt_plugin/lib/{}.so".format(trt_plugin_name))

willdla commented 11 months ago

@willdla test_onehot_dynamic_direct.py is in example directory. maybe you shold exec python test_onehot_dynamic_direct.py in $ROOT/python/ directory or change ctypes.cdll.LoadLibrary("./trt_plugin/lib/{}.so".format(trt_plugin_name)) to ctypes.cdll.LoadLibrary("../python/trt_plugin/lib/{}.so".format(trt_plugin_name))

两种方法都不起作用 报错和最开始相同

本地是否验证过可行?

willdla commented 10 months ago

@wenqf11

buptqq commented 10 months ago

@willdla 这种情况一般是编译plugin的TensorRT版本与实际用的tensorRT版本不一致导致的,另外check一下MakeFile里的sm_XX与你的机型是否相同, V100(sm_70),A100(sm_80),A10(sm_86)