aws / graph-explorer

React-based web application that enables users to visualize both property graph and RDF data and explore connections between data without having to write graph queries.
https://github.com/aws/graph-explorer
Apache License 2.0
305 stars 46 forks source link

Optimize schema sync DB queries #354

Open kmcginnes opened 2 months ago

kmcginnes commented 2 months ago

Some users report having time outs when syncing the schema on larger databases.

We should investigate if there is anything we can do to improve the chances of a successful sync.

Related Issues

Steps today

  1. Fetch summary schema
  2. Fetch attributes for one node of each label
  3. Fetch attributes for one edge of each label

Gremlin

This is the query ran after the summary query in my test environment.

g.V()
  .project(
    "Comment","Organization","vertex","software","Post","Airport2","Region2",
    "Forum","Country2","person","Tag","TagClass","Person","Place"
  )
  .by(V().hasLabel("Comment").limit(1))
  .by(V().hasLabel("Organization").limit(1))
  .by(V().hasLabel("vertex").limit(1))
  .by(V().hasLabel("software").limit(1))
  .by(V().hasLabel("Post").limit(1))
  .by(V().hasLabel("Airport2").limit(1))
  .by(V().hasLabel("Region2").limit(1))
  .by(V().hasLabel("Forum").limit(1))
  .by(V().hasLabel("Country2").limit(1))
  .by(V().hasLabel("person").limit(1))
  .by(V().hasLabel("Tag").limit(1))
  .by(V().hasLabel("TagClass").limit(1))
  .by(V().hasLabel("Person").limit(1))
  .by(V().hasLabel("Place").limit(1))
  .limit(1)

The query that gets the edge attributes:

g.E()
  .project(
    "islocatedIn","hasCreator","studyAt","hasTag","workAt","hasMember",
    "WITHIN","isPartOf","KNOWS","hasModerator","hasInterest","isLocatedIn",
    "isSubClass","containerOf","replyOf","hasType","knows","likes"
  )
  .by(V().bothE("islocatedIn").limit(1))
  .by(V().bothE("hasCreator").limit(1))
  .by(V().bothE("studyAt").limit(1))
  .by(V().bothE("hasTag").limit(1))
  .by(V().bothE("workAt").limit(1))
  .by(V().bothE("hasMember").limit(1))
  .by(V().bothE("WITHIN").limit(1))
  .by(V().bothE("isPartOf").limit(1))
  .by(V().bothE("KNOWS").limit(1))
  .by(V().bothE("hasModerator").limit(1))
  .by(V().bothE("hasInterest").limit(1))
  .by(V().bothE("isLocatedIn").limit(1))
  .by(V().bothE("isSubClass").limit(1))
  .by(V().bothE("containerOf").limit(1))
  .by(V().bothE("replyOf").limit(1))
  .by(V().bothE("hasType").limit(1))
  .by(V().bothE("knows").limit(1))
  .by(V().bothE("likes").limit(1))
  .limit(1)

Improvements

dsaban-lightricks commented 2 months ago

Please see #226 and #225

kmcginnes commented 2 months ago

@dsaban-lightricks Thank you!

We will consider that approach when we start work on this issue.