Closed mmigdal-nv closed 1 year ago
In the case of matmuls, this happens to fix the cases where:
nvfuser_index_t
)nvfuser_index_t
as we don't recompile even if we compute the right size in that case.As I mentioned to @mmigdal-nv, I think the fix of this PR is sufficient. As long as a fusion is executed through FusionExecutorCache, we should not see back-and-forth recompilations due to index mode changes. The only request I have for @mmigdal-nv is to add a simple C++ test that verifies this behavior. https://github.com/csarofeen/pytorch/pull/2522#discussion_r1119341798
Fixed issues:
KernelArgumentHolder
's indexing mode changes.kernelName()
so we can useKernelDb
with the keykernel_code_
. CurrentlyKernelDb
ignores the wrapped code (#defines, runtime library, ...) and relies only on the kernel. Without changing the kernel name we would be getting back the wrong cubins.Improvements:
KernelArgumentHolder
retroactively.-1
incollectIndexMode
is misleading. In the case of a 1D tensor, having a type that holds the tensor's index is not enough - we need to be able to hold the bound itself (so we can compare index to the bound without overflows).Changes:
cparams.index_type
is not set toDataType::Index
so the kernel can be lowered once and we update/setnvfuser_index_t
after, as required.