Hello, I have a question related to distributed GNN training using DistDGL, and I hope you may give me some advice if I was wrong or not.
My understanding of distributed GNN training: To my knowledge, after partitioning the graph, each worker should have its own partition with all its k-hop neighbors, where k is the num of GNN layers, together with its own partition's node/edge embeddings. And then during training, workers use DistTensor to request embeddings from other hosts according to their own partitions.
Thus, I wonder why does the partition example not assign num_hops according to the GNN num_layers but use the default value 1, while the num_layers in training scripts is by default 2. I wonder if this is correct for training.
Besides, if I change the num_layers in the training scripts, is the training process still correct in this example?
❓ Questions and Help
Hello, I have a question related to distributed GNN training using DistDGL, and I hope you may give me some advice if I was wrong or not.
My understanding of distributed GNN training: To my knowledge, after partitioning the graph, each worker should have its own partition with all its k-hop neighbors, where k is the num of GNN layers, together with its own partition's node/edge embeddings. And then during training, workers use
DistTensor
to request embeddings from other hosts according to their own partitions.Thus, I wonder why does the partition example not assign
num_hops
according to the GNNnum_layers
but use the default value 1, while thenum_layers
in training scripts is by default 2. I wonder if this is correct for training.Besides, if I change the
num_layers
in the training scripts, is the training process still correct in this example?