Closed npuichigo closed 1 year ago
Yeah this has been brought up before, for now this is not a goal of this project. It would require an async cuda library (cudarc has no async support), and then pretty much a full rewrite of everything to be async. So too much work at this point. It's a cool idea though!
First of all, I am very excited to see such a project.
I'm a researcher in deep learning, but also have some interest in HPC and async programming in C++. As far as I know, C++26 is persuing a unified abstraction for async computation. Since CUDA kernals are lanched asynchrously, it's perfetly suited into this framework. (FYI: https://github.com/NVIDIA/stdexec/tree/main/examples/nvexec and https://www.hpcwire.com/2022/12/05/new-c-sender-library-enables-portable-asynchrony/)
I'm quite interested in the trail in deep learning framework to further improve performance. Rust also has generic and native support for async, maybe there's something that could be done.
Thanks.