Closed carterbox closed 1 year ago
Use CuPy fuse to merge more function calls into single kernels which reduces the use of intermediate memory arrays and the number of kernel launches.
On a benchmarking dataset, the memory profile starts with 15.29GB of GPU memory acquired. After this PR, only 13.88GB of memory acquired.
Purpose
Use CuPy fuse to merge more function calls into single kernels which reduces the use of intermediate memory arrays and the number of kernel launches.
Approach
On a benchmarking dataset, the memory profile starts with 15.29GB of GPU memory acquired. After this PR, only 13.88GB of memory acquired.
Pre-Merge Checklists
Submitter
Reviewer