mfbalin commented 11 months ago

🔨Work Item

IMPORTANT:

This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

Let's add more large datasets such as ogbn-papers100M from Open Graph Benchmark and even larger IGB-datasets (homogenous and hetero variants).

ogbn-papers100M is a better dataset than ogbn-products because ogbn-products takes less than 1 minute to train on the GPU is on a smaller scale with #6861. We should do our profiling with examples that use larger-scale datasets. Finally, the IGB datasets are much much larger scale, and training them will take much longer time, which will be a really good benchmark for us to test largest-scale training scenarios (Covers both homogenous igb prefix and heterogenous scenarios igbh prefix).

[x] ogbn-papers100M
[ ] igb-hom #7717, #7770
[ ] igb-het #7708

mfbalin commented 11 months ago

@Rhett-Ying @peizhou001 @frozenbugs

mfbalin commented 10 months ago

@caojy1998 do you have any timeline on when we can have ogbn-papers100M as a built-in dataset? I would like to update the multiGPU example to have the papers100M as an option.

caojy1998 commented 10 months ago

@caojy1998 do you have any timeline on when we can have ogbn-papers100M as a built-in dataset? I would like to update the multiGPU example to have the papers100M as an option.

Maybe we can expect we have a version before Feb 9th.

mfbalin commented 10 months ago

@caojy1998 IGB dataset link: https://github.com/IllinoisGraphBenchmark/IGB-Datasets

caojy1998 commented 9 months ago

Currently Papers100M is added.

mfbalin commented 8 months ago

@caojy1998 Could we also add IGB-full or IGB260M? To test, we can start with the tiny variant, e.g. IGB-tiny.

mfbalin commented 8 months ago

@frozenbugs is there anyone else who can take over this issue? If we can support the IGB datasets, both homogenous and heterogenous, it would be a good benchmark for us going forward to improve our scalability and test every aspect of GraphBolt.

mfbalin commented 3 months ago

dmlc / dgl

[GraphBolt] Add more and larger `BuiltinDataset`s #6909

🔨Work Item

Description

7717, #7708