Open s769 opened 2 years ago
There is support for this via https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/kernels/multi_device_kernel.py (see the tutorial notebook here).
Note that if your kernel is large enough to use checkpointing, you may be better off using KeOps on a single GPU just due to overhead: https://github.com/cornellius-gp/gpytorch/blob/master/examples/02_Scalable_Exact_GPs/KeOps_GP_Regression.ipynb
Oh, I see -- I missed the "multiple processes" bit, my bad! That's currently not supported, but it might be possible to do something similar to what is done in MultiDeviceKernel, which extends DataParallel, by extending DistributedDataParallel in a similar way
@s769 we'd be open to a PR, if you'd be willing to implement this!
I tried to run the code from the tutorial but got an error. The output covariance matrix is on multiple GPUs instead of output_device.
I tried to run the code from the tutorial but got an error. The output covariance matrix is on multiple GPUs instead of output_device.
I get the same error. Did you manage to fix it?
I tried to run the code from the tutorial but got an error. The output covariance matrix is on multiple GPUs instead of output_device.
I get the same error. Did you manage to fix it?
No, I give up. The error I'm getting is that I can't regroup a lazy tensor on multiple GPUs into the output device. The approach I used later was to disassemble the image into pieces, then do Gaussian regression on each, and finally assemble them on the output device.
Are there any updates on why the tutorial notebook fails?
No, I have not made any progress.
I have just tried to run the CIFAR DKL on multiple GPUs (AMD300X) and I am getting the same issue as above. Though it is pretty fast to train on a single one
Is there a way to train on multiple GPUs across multiple processes (i.e. through torch.nn.parallel.DistributedDataParallel)?