And reduce tuning time by fixing a bug in the pre-compile step
More details:
Previously, the pre-compiled step is done in parallel. However, the compiled kernels are not cached. Therefore, at the tuning step, we still pay the overhead of kernel compilation.
And reduce tuning time by fixing a bug in the pre-compile step
More details: Previously, the pre-compiled step is done in parallel. However, the compiled kernels are not cached. Therefore, at the tuning step, we still pay the overhead of kernel compilation.