Closed Routhleck closed 6 months ago
Although we can now use the flexible taichi custom operator approach, taichi on cuda does not have more fine-grained control or optimization for some scenarios. So for such scenarios, we can use cupy's RawModule(https://docs.cupy.dev/en/stable/user_guide/kernel.html#raw-kernels) or jit.rawkernel(https://docs.cupy.dev/en/stable/user_guide/kernel.html#jit-kernel-definition) to compile and run CUDA native code directly as strings in real time for finer grained control.
RawModule
jit.rawkernel
https://github.com/cupy/cupy/issues/8232
Before merging this PR, I will give some modifications.
Although we can now use the flexible taichi custom operator approach, taichi on cuda does not have more fine-grained control or optimization for some scenarios. So for such scenarios, we can use cupy's
RawModule
(https://docs.cupy.dev/en/stable/user_guide/kernel.html#raw-kernels) orjit.rawkernel
(https://docs.cupy.dev/en/stable/user_guide/kernel.html#jit-kernel-definition) to compile and run CUDA native code directly as strings in real time for finer grained control.