Open wugoukanle opened 1 year ago
Not just loop tiles, Ansor should be tuning such as unroll、block tile、vectorize and the implementation of the kernel(for many kernels, decide which kernels can be fused). but Ansor still lacks the ability of tensorize, So does not perform well on computationally intensive operators such as gemm and conv. The above are just some rough instructions. if you want more detailed information, We recommend you to view the source code. Tune config in TPAT Tvm Source Code
cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?
cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?