Adds device options for MPS (Apple GPU) and XPU (Intel GPU), similarly to the addition of GPUs via CUDA.
In theory there are quite a few additional devices we could add (full list here / here), but these two are of most interest from discussions with @jatkinson1000.
I haven't been able to test the XPU device, but basic tests with MPS seem to suggest it's working as expected:
In example 2, resnet_infer_fortran, setting:
model = torch_module_load(args(1), device_type=torch_kMPS)
without changing the input tensor device throws an error:
RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0') must be on the same device
Similarly, setting the input tensor device, but not the model
RuntimeError: Input type (MPSFloatType) and weight type (CPUFloatType) should be the same
Setting both works and the expected output is produced:
Samoyed (id= 259 ), : probability = 0.884624064
I also see spikes in activity on my GPU (for the largest spikes, I added a loop around the example inference):
Note, when running 10,000 iterations of the inference, I got an error:
RuntimeError: MPS backend out of memory (MPS allocated: 45.89 GB, other allocations: 9.72 MB, max allowed: 45.90 GB). Tried to allocate 784.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
which might suggest a problem with cleanup.
I don't think this is specific to MPS, so might be worth checking on GPU too (you can reduce the CUDA memory to debug more easily, if it helps).
Adds device options for MPS (Apple GPU) and XPU (Intel GPU), similarly to the addition of GPUs via CUDA.
In theory there are quite a few additional devices we could add (full list here / here), but these two are of most interest from discussions with @jatkinson1000.
I haven't been able to test the XPU device, but basic tests with MPS seem to suggest it's working as expected:
In example 2, resnet_infer_fortran, setting:
without changing the input tensor device throws an error:
Similarly, setting the input tensor device, but not the model
throws an error:
Setting both works and the expected output is produced:
I also see spikes in activity on my GPU (for the largest spikes, I added a loop around the example inference):
Note, when running 10,000 iterations of the inference, I got an error:
which might suggest a problem with cleanup.
I don't think this is specific to MPS, so might be worth checking on GPU too (you can reduce the CUDA memory to debug more easily, if it helps).