According to my understanding, this may be a potential bug in the distributed dgl. There is such a code for neighbor sampling in the CSRRowWisePerEtypePick function in the rowwise_pick.h file to determine the type of an edge.
This function works normally when all edges are the inner edges of this slice, but for the outer edges, it is possible to trigger the following assertion error.
Let me give you an example. Now local_etype_offset is [0,5,10] and fanout is [1,1]. If the point I sample is the internal point of this partition, but the only edge that exists at this point is the external edge, because this edge is the external edge, its eid is likely to be greater than 10. At this time, we calculate that the heterogenizedetype of this outer edge is 2, but when we enter the following assertion, we will prompt the error prompt of et [et idx [len-1]] < num _ etypes (2vs2) etypevalues exceeding the number of fanouts.
Environment
DGL Version (e.g., 1.0): 2.1
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): 2.3.0
🐛 Bug
To Reproduce
According to my understanding, this may be a potential bug in the distributed dgl. There is such a code for neighbor sampling in the CSRRowWisePerEtypePick function in the rowwise_pick.h file to determine the type of an edge.
This function works normally when all edges are the inner edges of this slice, but for the outer edges, it is possible to trigger the following assertion error. Let me give you an example. Now local_etype_offset is [0,5,10] and fanout is [1,1]. If the point I sample is the internal point of this partition, but the only edge that exists at this point is the external edge, because this edge is the external edge, its eid is likely to be greater than 10. At this time, we calculate that the heterogenizedetype of this outer edge is 2, but when we enter the following assertion, we will prompt the error prompt of et [et idx [len-1]] < num _ etypes (2vs2) etypevalues exceeding the number of fanouts.
Environment
conda
,pip
, source): conda