Tencent / TPAT

TensorRT Plugin Autogen Tool
Apache License 2.0
367 stars 42 forks source link

cuda kernel code generated by Ansor‘s search space will use shared memory optimization to auto tuning? #25

Open wugoukanle opened 1 year ago

wugoukanle commented 1 year ago

cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?

buptqq commented 1 year ago

Not just loop tiles, Ansor should be tuning such as unroll、block tile、vectorize and the implementation of the kernel(for many kernels, decide which kernels can be fused). but Ansor still lacks the ability of tensorize, So does not perform well on computationally intensive operators such as gemm and conv. The above are just some rough instructions. if you want more detailed information, We recommend you to view the source code. Tune config in TPAT Tvm Source Code

cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?