Closed maqy1995 closed 4 years ago
Yep it's because of kernel fusion. You can simply understand it as we directly compute the result on destination nodes without copying node feature to edges (saving the gpu memory cost of #edges * D), which is required by most scatter-gather based frameworks such as pyg.
Yep it's because of kernel fusion. You can simply understand it as we directly compute the result on destination nodes without copying node feature to edges (saving the gpu memory cost of #edges * D), which is required by most scatter-gather based frameworks such as pyg.
Thanks for your reply, is there any more detailed documents or papers on kernel fusion or scatter-gather?
We will post the updated version of DGL paper on ArXiv describing the new features and system design of DGL in these days, please stay tuned.
Great! Thanks again for your reply, I will close this issue.
❓ Questions and Help
When I run the graphSAGE example on the Reddit dataset(my GPU is Tesla T4), I found that DGL can add all training sets for training or inferencing, while pyG will be OOM when the batch size reaches about 9000. I found in the experiment that when the batch size is increased from 1024 to 8192, the memory usage(inference) of dgl is about 1G to 2G, and the memory usage of pyG is expanded from 2G to 12G. It seems that in dgl, the batch size increased by 8 times, and there was no corresponding increase in memory. What is the reason for this phenomenon? What makes DGL memory use less than pyG?
ps: Is this benefit brought by ‘kernel fusion’? I find the blog:https://www.dgl.ai/blog/2019/05/04/kernel.html But I did not understand it clearly.