Closed telegraphic closed 1 year ago
Something to watch as it evolves: "Cuda Python" = cuda + numba, supported by nvidia. https://developer.nvidia.com/blog/numba-python-cuda-acceleration/
Stepped plan uses much less memory -- still optimization to do, but can now fit a full 2^20 channels on the GPU no issue.
The pipeline seems to be using much more GPU RAM than seems reasonable -- figure out why this is! Am I creating multiple copies of the same array?