dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.18k stars 2.99k forks source link

maybe a potential bug of neighbor sampling of distributed dgl heterogeneous graph #7473

Open yfismine opened 1 week ago

yfismine commented 1 week ago

🐛 Bug

To Reproduce

According to my understanding, this may be a potential bug in the distributed dgl. There is such a code for neighbor sampling in the CSRRowWisePerEtypePick function in the rowwise_pick.h file to determine the type of an edge.

image

This function works normally when all edges are the inner edges of this slice, but for the outer edges, it is possible to trigger the following assertion error. Let me give you an example. Now local_etype_offset is [0,5,10] and fanout is [1,1]. If the point I sample is the internal point of this partition, but the only edge that exists at this point is the external edge, because this edge is the external edge, its eid is likely to be greater than 10. At this time, we calculate that the heterogenizedetype of this outer edge is 2, but when we enter the following assertion, we will prompt the error prompt of et [et idx [len-1]] < num _ etypes (2vs2) etypevalues exceeding the number of fanouts.

Environment