LaurentMazare / tch-rs

Rust bindings for the C++ api of PyTorch.
Apache License 2.0
4.36k stars 348 forks source link

Cuda Stream Support #234

Open k21 opened 4 years ago

k21 commented 4 years ago

Thank you for creating and maintaining this library, it's great being able to experiment with machine learning in Rust. I am looking for potential performance improvements of some code that uses it.

Based on the CUDA streams section at https://pytorch.org/docs/stable/notes/cuda.html, it is my understanding that using streams is necessary to allow multiple operations to execute concurrently on a single GPU (one workaround could be to run operations from different processes, but that also has overhead).

Here are some sources I found that describe how streams can be used with the C++ API:

Would it be possible to add support for CUDA streams to tch?

NOBLES5E commented 4 years ago

Related C++ struct: at::cuda::CUDAStreamGuard and method: c10::cuda::setCurrentCUDAStream.

njaard commented 2 years ago

@LaurentMazare Right now I'm loading your great rust-bert crate to run a model. I'd like to have a bunch of Rust threads do operations on the same model but it seems that I gradually run out of memory. I could wrap the model (and therefor all of torch) in a mutex, but then I can't saturate the GPU's throughput. I suspect that if I put every rust thread in its own Stream it would work. Do you agree with that?

If so, what are the steps for getting this feature merged? I could start by actually testing it!