alibaba / graphlearn-for-pytorch

A GPU-accelerated graph learning library for PyTorch, facilitating the scaling of GNN training and inference.
Apache License 2.0
113 stars 34 forks source link

AttributeError: module 'graphlearn_torch.py_graphlearn_torch' has no attribute 'SampleQueue' #26

Closed kaixuanliu closed 1 year ago

kaixuanliu commented 1 year ago

🐛 Describe the bug

When I run igbh example for distributed CPU training python dist_train_rgnn.py --num_nodes=2 --node_rank=0 --num_training_procs=2 --master_addr=localhost --model='rgat' --dataset_size='tiny' --num_classes=19 and python dist_train_rgnn.py --num_nodes=2 --node_rank=1 --num_training_procs=2 --master_addr=localhost --model='rgat' --dataset_size='tiny' --num_classes=19, it returns error with info Traceback (most recent call last): File "/mnt/disk1/kaixuan/anaconda3/envs/gltorch-cpu/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/kaixuan/ws/pyG-work/graphlearn-for-pytorch/examples/igbh/dist_train_rgnn.py", line 84, in run_training_proc train_loader = glt.distributed.DistNeighborLoader( File "/mnt/disk1/kaixuan/anaconda3/envs/gltorch-cpu/lib/python3.8/site-packages/graphlearn_torch/distributed/dist_neighbor_loader.py", line 92, in __init__ super().__init__( File "/mnt/disk1/kaixuan/anaconda3/envs/gltorch-cpu/lib/python3.8/site-packages/graphlearn_torch/distributed/dist_loader.py", line 177, in __init__ self._channel = ShmChannel(self.worker_options.channel_capacity, File "/mnt/disk1/kaixuan/anaconda3/envs/gltorch-cpu/lib/python3.8/site-packages/graphlearn_torch/channel/shm_channel.py", line 48, in __init__ self._queue = pywrap.SampleQueue(capacity, shm_size) AttributeError: module 'graphlearn_torch.py_graphlearn_torch' has no attribute 'SampleQueue'

Environment

husimplicity commented 1 year ago

It seems that the SampleQueue-relevant pybind code is included within the macro #ifdef WITH_CUDA, so it fails to be found by python call once WITH_CUDA is set to OFF. This will be soon fixed.