CUSOLVER (dense): cache workspace in fat handle

JuliaGPU / CUDA.jl

CUDA programming in Julia.

https://juliagpu.org/cuda/

Other

1.21k stars 222 forks source link

CUSOLVER (dense): cache workspace in fat handle #2465

Closed bjarthur closed 2 months ago

bjarthur commented 3 months ago

riffing off of https://github.com/JuliaGPU/CUDA.jl/pull/2279 for getrf, getrs, sytrf, sytrs and friends. much cleaner API than https://github.com/JuliaGPU/CUDA.jl/pull/2464.

bjarthur commented 3 months ago

tests pass locally but i have not battle tested this yet

bjarthur commented 3 months ago

ready for review. tests really pass locally now, and it works well in my application.

maleadt commented 3 months ago

Superficially LGTM, but I don't have the time for a thorough review right now.

This probably should also integrate with the reclaim_hooks in order to wipe the caches when running out of memory (both here and in the other fat handles). EDIT: let's move this to a separate issue.

maleadt commented 3 months ago

I’m wondering if CUDA.jl consistently reuses the same handle. I know we can have up to 32 handles, but for efficiency, we should reuse the one that stored the buffer from the previous factorization. Otherwise, we'll end up storing a lot of unnecessary workspaces.

We do as long as you're using the same task, which I assume you are. In that case, calling handle() will always return the same object, and only a single handle will be cached.

maleadt commented 2 months ago

Forgot about this; rebased to give it another CI run.