Open k21 opened 4 years ago
Related C++ struct: at::cuda::CUDAStreamGuard
and method: c10::cuda::setCurrentCUDAStream
.
@LaurentMazare Right now I'm loading your great rust-bert
crate to run a model. I'd like to have a bunch of Rust threads do operations on the same model but it seems that I gradually run out of memory. I could wrap the model (and therefor all of torch) in a mutex, but then I can't saturate the GPU's throughput. I suspect that if I put every rust thread in its own Stream it would work. Do you agree with that?
If so, what are the steps for getting this feature merged? I could start by actually testing it!
Thank you for creating and maintaining this library, it's great being able to experiment with machine learning in Rust. I am looking for potential performance improvements of some code that uses it.
Based on the CUDA streams section at https://pytorch.org/docs/stable/notes/cuda.html, it is my understanding that using streams is necessary to allow multiple operations to execute concurrently on a single GPU (one workaround could be to run operations from different processes, but that also has overhead).
Here are some sources I found that describe how streams can be used with the C++ API:
Would it be possible to add support for CUDA streams to tch?