aws / graph-explorer

React-based web application that enables users to visualize both property graph and RDF data and explore connections between data without having to write graph queries.
https://github.com/aws/graph-explorer
Apache License 2.0
314 stars 47 forks source link

[Bug] RDF: Adding resource to the canvas is very slow or fails with out of memory #263

Open kmcginnes opened 6 months ago

kmcginnes commented 6 months ago

Community Note

Describe the bug On a larger RDF database, when I add a resource from the search panel to the canvas it can take anywhere from 30 seconds to 10 minutes to complete. During this time no indication is given to the user that something is happening.

To Reproduce Steps to reproduce the behavior:

  1. Connect to a large RDF database with SPARQL
  2. Search for a resource with many neighbors or relationships
  3. Add that resource to the canvas
  4. Observe nothing happening in the UI for a long time

You can see the pending request in the browser's network tab.

Slow response

In this example, the request took 1.4 min to complete:

CleanShot 2024-03-08 at 16 14 56@2x

And here you can see there are not that many neighbors:

CleanShot 2024-03-08 at 16 14 43@2x

And here is the query that was executed:

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/270> }
      UNION
      { <http://aws.amazon.com/neptune/csv2rdf/resource/270> ?p ?subject }
    }
    LIMIT 500
  }
}
GROUP BY ?class

Out of Memory Error

The query that was executed was:

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
      UNION
      { <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?subject }
    }
    LIMIT 500
  }
}
GROUP BY ?class

This resulted in an out of memory error:

{
    "error": {
        "status": 500,
        "message": "\n{\n  \"detailedMessage\": \"Operation terminated (out of memory)\",\n  \"requestId\": \"38d41423-0bb8-446a-8d11-4de1ee8cfb24\",\n  \"code\": \"MemoryLimitExceededException\",\n  \"message\": \"Operation terminated (out of memory)\"\n}"
    }
}

Expected behavior Adding a single resource to the canvas should not be slow or cause errors.

Cole-Greer commented 6 months ago

I believe that query could be improved significantly. That query appears to be counting the number of neighbours for the "new resource", grouped by the neighbour's class.

I don't see a need for it to have the subquery to select all of the neighbours, nor do I see a need for using DISTINCT here. To the best of my knowledge, the only way that subquery could produce duplicate results would be if there were 2 duplicate statements of the form ?subject a ?class. I don't believe that Neptune allows for duplicate statements (this should be verified).

Given this, I would expect a query such as this to perform much better and produce equivalent results:

SELECT ?class (COUNT(?class) AS ?count) {
  ?neighbour a ?class .
  { ?neighbour ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
  UNION
  { <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?neighbour }
}
GROUP BY ?class
kmcginnes commented 6 months ago

Possibly related to