dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.25k stars 2.99k forks source link

[GraphBolt][Hetero] Utilize single tensor with offsets instead of dict #7244

Open mfbalin opened 4 months ago

mfbalin commented 4 months ago

🔨Work Item

IMPORTANT:

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

If we can utilize a single tensor along with offsets to store seeds or sampled edges, that can make our code more performant by avoiding loops over etypes on the Python side.

So that we can almost remove these two functions:

https://github.com/dmlc/dgl/blob/2f585940a80efd39639388dfc206498b3279e58d/python/dgl/graphbolt/impl/fused_csc_sampling_graph.py#L454-L469

We can use gb.expand_indptr to perform batched computations on the whole tensor in a single call to first broadcast the elements of the operation and then perform the computation.

@frozenbugs @peizhou001

mfbalin commented 4 months ago

@frozenbugs Do you think we can perform the separation of different etypes into different sampled csc inside the dgl blocks conversion function, so that our sampling code does not need any loops over etypes when sampling?

We can use a batched representation overall throughout sampling code and only convert into dictionaries when absolutely needed.