Open VibhuJawa opened 2 years ago
Hi @VibhuJawa , by passing a list of integers, say [15, 10 ,5]
, do you mean the function should return a list of MFGs (or block graphs) with 15, 10, 5 fanouts respectively? If so, we probably should propose a new API of GraphStorage instead of extending the current sample_neighbors
.
cc @BarclayII
Hi @VibhuJawa , by passing a list of integers, say
[15, 10 ,5]
, do you mean the function should return a list of MFGs (or block graphs) with 15, 10, 5 fanouts respectively? If so, we probably should propose a new API of GraphStorage instead of extending the currentsample_neighbors
.
Yup. Thats exactly what i mean. When you say proposing a new API for GraphStorage are you thinking we should have a new API say something like sample_blocks
, similar to sample_neigbors
that we implement in the GraphStorage
class and then call that function if present ?
The other alternative i was considering was to create a wrapper around the current NeighborSampler and overwrite the sample_nighbours
to not make calls one at a time to sample_neighbors
.
Yup. Thats exactly what i mean. When you say proposing a new API for GraphStorage are you thinking we should have a new API say something like
sample_blocks
, similar tosample_neigbors
that we implement in theGraphStorage
class and then call that function if present ?
That's right. Could you create a proposal about the interface? Eventually, we need to add it to the doc of GraphStorage
so users know what to implement if they wish it to work with certain samplers. @BarclayII has created a draft doc here.
That's right. Could you create a proposal about the interface?
Thanks. I will get started on it.
@jermainewang , Raised RFC here: https://github.com/dmlc/dgl/issues/4498
đ Feature
Currently, in
sample_neigbours
we only support a single fanout value (see link), i think it will be useful to look into a way of supporting a list of values with the same call .Motivation
Currently, in sample blocks (see below), we iterate for each value of
fanout.
Though this approach works when the data is present locally when we want to do this through an external service (say something likeGAAS
) this can become a bottleneck as we will need to transfer graphs back and forth from the service for each value of fanout.This will mean we do a bunch of calls like below which can become really slow.
https://github.com/dmlc/dgl/blob/d41d07d0f6cbed17993644b58057e280a9e8f011/python/dgl/dataloading/neighbor_sampler.py#L106-L120
Alternatives
The other alternative will be to write a
NeighborSampler
specific to graph-store objects likeGAASStore
where we don't want to do this back and forth .Additional context
TODO: Add benchmark