apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.
https://graphar.apache.org/
Apache License 2.0
226 stars 46 forks source link

[Discussion] Gathering graph dataset to construct a data hub with GraphAr format #275

Open acezen opened 1 year ago

acezen commented 1 year ago

For improving the ability of GraphAr format, we prepare to construct a data hub with GraphAr format.

This issue is for gathering graph dataset, which is best to meet the following requirements:

Any comments, questions, and dataset suggestions are welcome!

acezen commented 1 year ago

LDBC SNB/BI dataset

SemyonSinchenko commented 11 months ago

What about Stanford Graph Dataset? There are a lot of network of different types from Kb to Gb. And they are already splitted into tasks, like community detection, graph classification, etc.

acezen commented 10 months ago

What about Stanford Graph Dataset? There are a lot of network of different types from Kb to Gb. And they are already splitted into tasks, like community detection, graph classification, etc.

That would be a great data source for GraphAr, thanks for the proposal!

lixueclaire commented 10 months ago

We could consider utilizing the following graph datasets for our proposal:

  1. Property Graphs: The LDBC graphs feature a variety of vertex and edge types, each with associated properties that encompass diverse data types. These graphs can be generated at various scales to accommodate different analysis needs.

  2. Simple Topological Graphs: The SNAP datasets offer a collection of real-world graphs from multiple domains, including social networks, web graphs, and road networks, among others. Additionally, the Laboratory for Web Algorithmics provides a range of large-scale web graphs compressed using LLP + WebGraph.These can be particularly useful for evaluating the storage efficiency of GraphAr.

  3. Labeled Property Graphs: The neo4j-graph-examples repository contains graphs in Neo4j dump format, characterized by the inclusion of vertex labels. Each vertex in these graphs may have multiple associated labels, adding complexity to the graph properties.

  4. GNN Graphs The OGBN graphs are tailored for node property prediction tasks, with the predicted labels being represented as vertex labels. These graphs are well-suited for representing GNN (Graph Neural Networks) graph structures.

Subsequent considerations may encompass the use of RDF (Resource Description Framework) datasets, temporal graphs, and knowledge graphs.

lixueclaire commented 10 months ago

@acezen, do you have any more comments on this proposal?

acezen commented 10 months ago

@acezen, do you have any more comments on this proposal?

Looks good to me