Add MPS and XPU devices

Adds device options for MPS (Apple GPU) and XPU (Intel GPU), similarly to the addition of GPUs via CUDA.

In theory there are quite a few additional devices we could add (full list here / here), but these two are of most interest from discussions with @jatkinson1000.

I haven't been able to test the XPU device, but basic tests with MPS seem to suggest it's working as expected:

In example 2, resnet_infer_fortran, setting:

model = torch_module_load(args(1), device_type=torch_kMPS)

without changing the input tensor device throws an error:

RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')  must be on the same device

Similarly, setting the input tensor device, but not the model

in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kMPS)

throws an error:

RuntimeError: Input type (MPSFloatType) and weight type (CPUFloatType) should be the same

Setting both works and the expected output is produced:

Samoyed (id=         259 ), : probability =  0.884624064

I also see spikes in activity on my GPU (for the largest spikes, I added a loop around the example inference):

Note, when running 10,000 iterations of the inference, I got an error:

RuntimeError: MPS backend out of memory (MPS allocated: 45.89 GB, other allocations: 9.72 MB, max allowed: 45.90 GB). Tried to allocate 784.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

which might suggest a problem with cleanup.

I don't think this is specific to MPS, so might be worth checking on GPU too (you can reduce the CUDA memory to debug more easily, if it helps).

Cambridge-ICCS / FTorch

Add MPS and XPU devices #125