coreylowman / dfdx

Deep learning in Rust, with shape checked tensors and neural networks
Other
1.71k stars 98 forks source link

CUDA Graphs #360

Open ViliamVadocz opened 1 year ago

ViliamVadocz commented 1 year ago

We should consider whether it is possible and desired to automatically combine kernels into CUDA graphs to reduce overhead of calling individual kernels.

Here is the relevant documentation:

This issue is probably not relevant to the GPU MVP, but should be tackled once optimizations become a concern. I opened this issue now because support for graphs might influcence how the GPU support code is structured.


Other resources:

ViliamVadocz commented 1 year ago

Relevant labels for this issue would be gpu and optimization.

coreylowman commented 1 year ago

I think we could do this at the device level - CudaGraph would be similar to Cuda, but instead of launching kernels it would add nodes to a graph (if not there already). I could envision the forward/backward/optimizer passes all being part of the graph. Perhaps when dev -> host transfer is requested is when it is actually executed?