chenzhao / light-dist-gnn

44 stars 5 forks source link

Incorrect positional argument #2

Open LongerZrLong opened 2 years ago

LongerZrLong commented 2 years ago

I try to run and notice that the usage of positional argument here is incorrect. The correct way to invoke the function should be

Parted_COO_Graph(, i, num_parts, preprocess_for=self.preprocess_for)

Otherwise, the self.preprocess_for will be passed to the device argument of Parted_CPP_Graph.

BearBiscuit05 commented 1 year ago

Thank you for your issue, I also solved this problem, but I encountered another problem, when I run the program on 2 GPUs, it shows the following error, I don’t know how to solve it, if you run this program successfully, can you give some advice?

Traceback (most recent call last):
  File "/home/light-dist-gnn/", line 37, in <module>
    torch.multiprocessing.spawn(process_wrapper, process_args, args.nprocs)
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/", line 188, in start_processes
    while not context.join():
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/", line 150, in join
    raise ProcessRaisedException(msg, error_index,

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/multiprocessing/", line 59, in _wrap
    fn(i, *args)
  File "/home/light-dist-gnn/", line 24, in process_wrapper
    func(env, args)
  File "/home/light-dist-gnn/", line 71, in main
    train(g, env, total_epoch=args.epoch)
  File "/home/light-dist-gnn/", line 39, in train
    outputs = model(g.features)
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/nn/modules/", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/light-dist-gnn/models/", line 105, in forward
    hidden_features = F.relu(DistGCNLayer.apply(features, self.weight1, self.g.adj_parts, 'L1'))
  File "/home/light-dist-gnn/models/", line 75, in forward
    z_local = cached_broadcast(adj_parts, features, 'Forward'+tag)
  File "/home/light-dist-gnn/models/", line 56, in cached_broadcast
    dist.broadcast(feature_bcast, src=src)
  File "/root/miniconda3/envs/gnn/lib/python3.9/site-packages/torch/distributed/", line 1159, in broadcast
    work = default_pg.broadcast([tensor], opts)
RuntimeError: Tensors must be CUDA and dense
LongerZrLong commented 1 year ago

It has been a while since I last ran the code and I am not sure whether I ran it with GPUs or on with merely CPU. I would recommend to first run with only CPU to see if the code works since the issue from your Error Stack Trace is likely related to CUDA in torch.