TuGraph-family / tugraph-db

TuGraph is a high performance graph database.
https://tugraph.org
Apache License 2.0
1.42k stars 190 forks source link

[Performance] Simple COUNT can not return for large graph #198

Open frank-zsy opened 1 year ago

frank-zsy commented 1 year ago

Describe the bug The simple COUNT for nodes can not return for a large graph.

To Reproduce Steps to reproduce the behavior:

  1. Import a large graph to database, like hundreds millions nodes and billions edges
  2. Run MATCH (n) RETURN COUNT(n)
  3. The Cypher above may not return forever

Expected behavior There maybe meta data about the graph in database so we can easily know the graph scale in my database.

Environment:

Additional context In Neo4j, nodes count and edges count are almost displayed on the sidebar of current database and will be updated while importing data, even for large graph with billions nodes and edges.

hjk41 commented 1 year ago

Currently TuGraph does no keep track of the number of vertexes or edges. Instead, it scans the whole graph and counts the vertexes/edges when asked to, which takes a long time. It is expected to return after a long time, but eventually it should return. You can estimate the time as the following: {total_db_size}/(100MB/s).

Tracking the number of vertexes and edges is a feature that can be implemented. I will see what I can do in the holidays.