dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.54k stars 3.01k forks source link

[GraphBolt] Add more and larger `BuiltinDataset`s #6909

Open mfbalin opened 10 months ago

mfbalin commented 10 months ago

🔨Work Item

IMPORTANT:

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

Let's add more large datasets such as ogbn-papers100M from Open Graph Benchmark and even larger IGB-datasets (homogenous and hetero variants).

ogbn-papers100M is a better dataset than ogbn-products because ogbn-products takes less than 1 minute to train on the GPU is on a smaller scale with #6861. We should do our profiling with examples that use larger-scale datasets. Finally, the IGB datasets are much much larger scale, and training them will take much longer time, which will be a really good benchmark for us to test largest-scale training scenarios (Covers both homogenous igb prefix and heterogenous scenarios igbh prefix).

mfbalin commented 10 months ago

@Rhett-Ying @peizhou001 @frozenbugs

mfbalin commented 9 months ago

@caojy1998 do you have any timeline on when we can have ogbn-papers100M as a built-in dataset? I would like to update the multiGPU example to have the papers100M as an option.

caojy1998 commented 9 months ago

@caojy1998 do you have any timeline on when we can have ogbn-papers100M as a built-in dataset? I would like to update the multiGPU example to have the papers100M as an option.

Maybe we can expect we have a version before Feb 9th.

mfbalin commented 9 months ago

@caojy1998 IGB dataset link: https://github.com/IllinoisGraphBenchmark/IGB-Datasets

caojy1998 commented 8 months ago

Currently Papers100M is added.

mfbalin commented 8 months ago

@caojy1998 Could we also add IGB-full or IGB260M? To test, we can start with the tiny variant, e.g. IGB-tiny.

mfbalin commented 7 months ago

@frozenbugs is there anyone else who can take over this issue? If we can support the IGB datasets, both homogenous and heterogenous, it would be a good benchmark for us going forward to improve our scalability and test every aspect of GraphBolt.

mfbalin commented 2 months ago

7717, #7708