Open mfbalin opened 10 months ago
@Rhett-Ying @peizhou001 @frozenbugs
@caojy1998 do you have any timeline on when we can have ogbn-papers100M
as a built-in dataset? I would like to update the multiGPU example to have the papers100M as an option.
@caojy1998 do you have any timeline on when we can have
ogbn-papers100M
as a built-in dataset? I would like to update the multiGPU example to have the papers100M as an option.
Maybe we can expect we have a version before Feb 9th.
@caojy1998 IGB dataset link: https://github.com/IllinoisGraphBenchmark/IGB-Datasets
Currently Papers100M is added.
@caojy1998 Could we also add IGB-full or IGB260M? To test, we can start with the tiny variant, e.g. IGB-tiny.
@frozenbugs is there anyone else who can take over this issue? If we can support the IGB datasets, both homogenous and heterogenous, it would be a good benchmark for us going forward to improve our scalability and test every aspect of GraphBolt.
🔨Work Item
IMPORTANT:
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
Let's add more large datasets such as
ogbn-papers100M
from Open Graph Benchmark and even larger IGB-datasets (homogenous and hetero variants).ogbn-papers100M
is a better dataset thanogbn-products
becauseogbn-products
takes less than 1 minute to train on the GPU is on a smaller scale with #6861. We should do our profiling with examples that use larger-scale datasets. Finally, the IGB datasets are much much larger scale, and training them will take much longer time, which will be a really good benchmark for us to test largest-scale training scenarios (Covers both homogenousigb
prefix and heterogenous scenariosigbh
prefix).