Tencent / TPAT

TensorRT Plugin Autogen Tool
Apache License 2.0
365 stars 42 forks source link

TPAT - TensorRT Plugin Autogen Tool

Introduction

  1. Automatically generate high-performance TensorRT plugins for unsupported operators or replacing inefficient kernels.
  2. End-to-end command line tool. No requirement for any CUDA programming knowledge. Users only need to provide the ONNX model and assign the node names or types to auto-generate TensorRT plugin.
  3. The performance of auto-generated TensorRT plugins in real cases:

Support Matrix

Runtime Env : dockerfile

1. Build image

nvidia-docker build .

2. Run container

nvidia-docker run -itd --gpus all -v <TPAT path dir>:/root <Image_ID> /bin/bash

3. Execute conrainer

nvidia-docker exec -it <Container_ID> /bin/bash

4. Modify CUDA_PATH and TRT_PATH in python/trt_plugin/Makefile

CUDA_PATH: local CUDA installation path
TRT_LIB_PATH: local TensorRT installation path

5. Plugin auto generated

cd examples
python test_onehot_dynamic_direct.py

Runtime Env : Build

1. Prerequisites

System Packages

PyPI packages

Optional packages

2. Clone the TPAT repository

git clone -b master https://github.com/nvidia/TensorRT TPAT
cd TPAT
git submodule update --init --recursive

3. Build BlazerML-TVM

mkdir build && cp cmake/config.cmake build
#Edit build/config.cmake to customize the compilation options
set(USE_LLVM /usr/local/llvm/bin/llvm-config)
set(USE_CUDA ON)
#gcc compiler is required to support C++14
cd build && cmake .. 
make -j
#TVM Python package
export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}

4. Plugin Compiler Env

Modify python/trt_plugin/Makefile according to your environment setup.

CUDA_PATH: local CUDA installation path
TRT_LIB_PATH: local TensorRT installation path

Usage

TPAT provides a Python function and command line for usage.

Python function

onnx2plugin(
    input_model_path, 
    output_model_path, 
    node_names=None, 
    node_types=None, 
    plugin_name_dict=None,
    dynamic_bs=False, # if True, this operator support dynamic batchsize
    min_bs=1,
    max_bs=256,
    opt_bs=128
    )

Command line

# Separate different ops with spaces
python3 Onnx2Plugin.py -i input.onnx -o output.onnx -n op_name1 op_name2 -dynamic=true -min=1 -max=512 -opt=256
python3 Onnx2Plugin.py -i input.onnx -o output.onnx -t op_type1 op_type2 -dynamic=false
python3 Onnx2Plugin.py -i input.onnx -o output.onnx -p '{"op_name1": "plugin_name1", "op_name2": "plugin_name2"}'

Output

1. Assign nodes and plugin names through plugin_name_dict

2. Assign node names or node types

Example && UnitTest

Release notes

Changelog

Known issues

TODO