Open khan-yin opened 1 year ago
Hi @khan-yin
This project was developed when DGL only stored graph structure in CPU memory. As a result, it's not compatible with newer features of DGL. Since DGL is a fast-developing library, I can imagine that the latest DGL is very different from the version I used more than 2 years ago, and potentially many code might be broken. Ideally, the preprocessing pipeline needs to be reworked, but unfortunately, I wouldn't have time to do so.
Essentially, the problem here is the code (when I wrote it) assumes graph structure related preprocessing happens on CPU, which violates current DGL's requirement that graph and feature should be on the same device. I think you have two straight-forward options to fix the issue:
1) Perform all preprocessing on CPU. Avoid moving the graph or any node/edge feature to GPU during the data loading. After preprocessing is done, move graph and feature to GPU and start training. The downside is you can't use GPU for preprocessing and this could take more time.
2) Perform all preprocessing on GPU. This is like what you did: move graph to GPU at the beginning. And I think this should be the right way to go. The line src = src.numpy()
was there because in DGL 0.4.3 (which was released 2-3 years ago), DGL can't take framework tensor (namely pytorch tensor) directly as input to construct the graph. If DGL now can directly take a torch GPU tensor as input, then the linesrc = src.numpy()
is no longer needed.
dear author, when I cloned the code and running with dglcu113,pytorch+cu113,python3.9 for ogbn-mag dataset,I got some problem about loading embbeding.pt generated by transE, do you have any solution to running without change my conda env?
if I add codes
g = g.to(device)
before loading it got another error with device, and I tried to fix the bug, but I made much more error😭,Thanks a lot.