chaitjo commented 5 months ago

Feature/behavior summary

I'm trying to get PyG to install and work well with Intel XPUs, and was hoping to use this repository as reference. At present, I see that PyG is never installed by default, and nor are any instructions for setting it up with XPUs available.

Request attributes

[ ] Would this be a refactor of existing code?
[ ] Does this proposal require new package dependencies?
[ ] Would this change break backwards compatibility?
[ ] Does this proposal include a new model?
[ ] Does this proposal include a new dataset?
[ ] Does this proposal include a new task/workflow?

Related issues

No response

Solution description

Unknown.

Additional notes

At present, working with a different repository (https://github.com/a-r-j/ProteinWorkshop), I've been trying to integrate your code for the XPU as a new accelerator in PyTorch Lightning: https://github.com/IntelLabs/matsciml/blob/main/matsciml/lightning/xpu.py.

So far, I'm able to get my trainer to identify the XPU as a device, but it seems like some torch_cluster operations are not compatible with tensor stored on XPUs. I would like to perform torch_cluster operations such as knn graph creation on XPU tensors so that I can do data processing in a batched manner or on-the-fly, as opposed to on the CPU.

Here is a minimal example which fails:

import torch
import intel_extension_for_pytorch as ipex
from torch_geometric.nn import knn_graph

device = torch.device('xpu:0' if torch.xpu.is_available() else 'cpu')

x = torch.tensor([[-1.0, -1.0], [-1.0, 1.0], [1.0, -1.0], [1.0, 1.0]]).to(device)
batch = torch.tensor([0, 0, 0, 0]).to(device)
edge_index = knn_graph(x, k=2, batch=batch, loop=False)

The resulting error is RuntimeError: x.device().is_cpu() INTERNAL ASSERT FAILED at "csrc/cpu/knn_cpu.cpp":12, please report a bug to PyTorch. x must be CPU tensor.

And here's a longer trace from the ProteinWorkshop codebase, which probably won't make any sense to MatSciML maintainers.

File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch_geometric/nn/pool/__init__.py", line 171, in knn_graph
    return torch_cluster.knn_graph(x, k, batch, loop, flow, cosine,
  File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch_cluster/knn.py", line 132, in knn_graph
    edge_index = knn(x, x, k if loop else k + 1, batch, batch, cosine,
  File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch_cluster/knn.py", line 81, in knn
    return torch.ops.torch_cluster.knn(x, y, ptr_x, ptr_y, k, cosine,
  File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch/_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: x.device().is_cpu() INTERNAL ASSERT FAILED at "csrc/cpu/knn_cpu.cpp":12, please report a bug to PyTorch. x must be CPU tensor

laserkelvin commented 5 months ago

Thanks for bringing this up! That's a good point, I think we've been taking a lot of the dependencies for granted and we'll update the documentation.

Nominally, PyG since a few versions ago, a lot of the PyG core functionality has been upstreamed to be PyTorch (e.g. torch_scatter stuff), but not everything; that means for the most part, PyG by itself should work out of the box on XPUs, however functionality that exists outside - torch_scatter, torch_cluster, torch_sparse - aren't supported yet. So the error you're seeing is basically the low level implementation for knn_graph only exists for CUDA or for CPUs, and it's expecting a tensor that resides on the latter.

I'm not 100% sure what our plans are for supporting those supplementary libraries, and so they might need to be treated on a case-by-case basis. Please reach out to me via email or Slack and we can discuss this further (even if it's not matsciml related). I'll keep this issue up still, since I agree we do need to update our PyG + XPU instructions.

chaitjo commented 5 months ago

Thanks!

What's the current recommended way to installing PyG?

I'm currently using:

pip install torch_geometric
pip install torch-scatter torch-cluster

..and this seems fine unless I need some of the functions from torch-cluster to be run on tensors which are located on XPUs. PyG's doc also states regarding torch-scatter and torch-cluster that these packages 'come with their own CPU and GPU kernel implementations based on the PyTorch C++/CUDA/hip(ROCm) extension interface.' So I suppose there's no real fix yet for my particular usecase apart from shifting my computation to the CPU.

laserkelvin commented 5 months ago

Those pip commands should work. If you are super paranoid, you can tack on --no-cache-dirs to make sure you're not using a cached version, and also --no-binary :all: to make sure it's built from source. If you have issues, I'd suggest you step through those :)

I've brought up torch_cluster support internally on some things we can potentially do, but will require some time. I'll send you an email separately.

laserkelvin commented 3 months ago

@chaitjo do you think I can close this issue?

198 updated the README, and I think it should be pretty complete - within the bounds of the current status of broader framework support

chaitjo commented 3 months ago

Yes please.

On Wed, 29 May 2024 at 4:30 PM, Kelvin Lee @.***> wrote:

@chaitjo https://github.com/chaitjo do you think I can close this issue?

198 https://github.com/IntelLabs/matsciml/pull/198 updated the README,

and I think it should be pretty complete - within the bounds of the current status of broader framework support

— Reply to this email directly, view it on GitHub https://github.com/IntelLabs/matsciml/issues/166#issuecomment-2137699413, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNYNIGUATPA5OXDECC2N3ZEXYCBAVCNFSM6AAAAABFGEFOHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZXGY4TSNBRGM . You are receiving this because you were mentioned.Message ID: @.***>

IntelLabs / matsciml

[Feature request]: PyG installation instructions (esp. for XPUs) #166