We have a large set of kernels in ClimaAtmos, and we have to partition them as shown below, because we run into ERROR: LoadError: Kernel invocation uses too much parameter memory; | 4.586 KiB exceeds the 4.000 KiB limit imposed by sm_60 / PTX v8.2 pretty easily. Since this is device-dependent, we should probably offer a mechanism to split the fused broadcasts into segments that are bounded by the parameter memory.
We have a large set of kernels in ClimaAtmos, and we have to partition them as shown below, because we run into
ERROR: LoadError: Kernel invocation uses too much parameter memory; | 4.586 KiB exceeds the 4.000 KiB limit imposed by sm_60 / PTX v8.2
pretty easily. Since this is device-dependent, we should probably offer a mechanism to split the fused broadcasts into segments that are bounded by the parameter memory.