Hello, I am reading your OSDI accepted article - MGG: Accelerating Graph Neural Networks with Fine-grained Intra-kernel Communication Computation Pipelining on Multi-GPU Platforms.
I am using the git project you provided, but the performance shown in the paper is not achieved, such as Compare with DGL on 8xA100 for GCN (Fig.7a )
dataset
speed up
Reddit_beg_pos
0.598862
enwiki-2013_beg_pos
0.980894
t-2004_beg_pos
2.319232
paper100M_beg_pos
3.729139
ogbn-products_beg_pos
2.551465
ogbn-proteins_beg_pos
0.655375
com-Orkut_beg_pos
5.647636
Test on SXM4 A100*8 80GB, pt-to-pt nvlink's bw = 600GB/sec
How should I adjust some configurations in your git to achieve the performance shown in the paper?
As we mentioned in our paper evaluation ("Platforms & Tools" paragraph), the major evaluation platform is 8×A100 GPUs (40 GB) and we use AWS P4dn.24xlarge instance for evaluation.
For 8xA100 (80GB) due to the difference in GPU global memory bandwidth (2,039GB/s) compared to A100 (40GB) (1,555GB/s), we believe there will be additional parameter-tuning efforts for A100-80GB to achieve better performance. Some other factors like the type and the number of CPU cores of DGX-A100-80GB versus DGX-A100-40GB would also affect the performance of DGL since they rely on zero-copy access with CPU involvements for fetching remote data on the host.
Test on SXM4 A100*8 80GB, pt-to-pt nvlink's bw = 600GB/sec
How should I adjust some configurations in your git to achieve the performance shown in the paper?