gallantlab / himalaya

Multiple-target linear models - CPU/GPU
https://gallantlab.github.io/himalaya
BSD 3-Clause "New" or "Revised" License
78 stars 13 forks source link

Specifying Specific Cuda Device (PyTorch backend) #66

Open MShinkle opened 3 weeks ago

MShinkle commented 3 weeks ago

Currently, specifying 'torch_cuda' as the backend appears to select the first CUDA device visible to PyTorch (cuda:0). However, in multi-gpu systems, it would be useful to specify a specific CUDA device through something like: set_backend("torch_cuda:3") which would tell Himalaya to use CUDA device 3. set_backend("torch_cuda") would still function equivalently to how it currently does.

Are there any interest or plans for this feature? From glancing through the Himalaya PyTorch backend I don't think implementing this would be too involved, but I could be mistaken.

mvdoc commented 3 weeks ago

With any cuda-based code, this is usually accomplished by setting the CUDA_VISIBLE_DEVICES environment variable before running the python script. See https://stackoverflow.com/questions/39649102/how-do-i-select-which-gpu-to-run-a-job-on

On Thu, Aug 22, 2024 at 10:06 AM MShinkle @.***> wrote:

Currently, specifying 'torch_cuda' as the backend appears to select the first CUDA device visible to PyTorch (cuda:0). However, in multi-gpu systems, it would be useful to specify a specific CUDA device through something like: set_backend("torch_cuda:3") which would tell Himalaya to use CUDA device 3. set_backend("torch_cuda") would still function equivalently to how it currently does.

Are there any interest or plans for this feature? From glancing through the Himalaya PyTorch backend I don't think implementing this would be too involved, but I could be mistaken.

— Reply to this email directly, view it on GitHub https://github.com/gallantlab/himalaya/issues/66, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO5TGVGBSIAZ56QZPMCACLZSYK75AVCNFSM6AAAAABM6SAIJ6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DCMZRHE4DAOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Matteo Visconti di Oleggio Castello, Ph.D. Postdoctoral Scholar Helen Wills Neuroscience Institute, UC Berkeley MatteoVisconti.com http://matteovisconti.com || github.com/mvdoc || linkedin.com/in/matteovisconti

MShinkle commented 3 weeks ago

PyTorch generally recommends against setting cuda device via this method, in favor of device('cuda:3') or device('cuda', index=3) syntax. Though I think the benefits of the latter (better support for using multiple CUDA devices within the same process) are unlikely to impact 99% of use cases, so maybe not worth the time to incorporate.

Thanks!

mvdoc commented 3 weeks ago

I think that adding this option to himalaya will complicate the code unnecessarily. But what happens if you manually push the features and data to the gpu that you want to use, and then those tensors are passed to the himalaya solvers? I wonder whether if this can be a workaround to avoid using the environment variable.

MShinkle commented 3 weeks ago

That's an interesting idea, my expectation is that it would convert it to whatever cuda:0 is in the current environment, but I'll give it a test.

MShinkle commented 3 weeks ago

Looks like backend.asarray moves tensors to cuda:0 regardless of the original device of the tensor. For example:

import torch
from himalaya.ridge import Ridge
from himalaya.backend import set_backend, get_backend

set_backend('torch_cuda')
backend = get_backend()

print(backend.asarray(torch.zeros(10, device='cuda:1')).device)

Prints cuda:0