[Bug] RDF: Adding resource to the canvas is very slow or fails with out of memory

kmcginnes commented 6 months ago

Community Note

Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Describe the bug On a larger RDF database, when I add a resource from the search panel to the canvas it can take anywhere from 30 seconds to 10 minutes to complete. During this time no indication is given to the user that something is happening.

OS: macOS 14.3.1
Browser: Arc (Google Chromium)
Graph Explorer Version: 1.5.1
Graph Database & Version: Amazon Neptune

To Reproduce Steps to reproduce the behavior:

Connect to a large RDF database with SPARQL
Search for a resource with many neighbors or relationships
Add that resource to the canvas
Observe nothing happening in the UI for a long time

You can see the pending request in the browser's network tab.

Slow response

In this example, the request took 1.4 min to complete:

CleanShot 2024-03-08 at 16 14 56@2x

And here you can see there are not that many neighbors:

CleanShot 2024-03-08 at 16 14 43@2x

And here is the query that was executed:

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/270> }
      UNION
      { <http://aws.amazon.com/neptune/csv2rdf/resource/270> ?p ?subject }
    }
    LIMIT 500
  }
}
GROUP BY ?class

Out of Memory Error

The query that was executed was:

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
      UNION
      { <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?subject }
    }
    LIMIT 500
  }
}
GROUP BY ?class

This resulted in an out of memory error:

{
    "error": {
        "status": 500,
        "message": "\n{\n  \"detailedMessage\": \"Operation terminated (out of memory)\",\n  \"requestId\": \"38d41423-0bb8-446a-8d11-4de1ee8cfb24\",\n  \"code\": \"MemoryLimitExceededException\",\n  \"message\": \"Operation terminated (out of memory)\"\n}"
    }
}

Expected behavior Adding a single resource to the canvas should not be slow or cause errors.

Cole-Greer commented 6 months ago

I believe that query could be improved significantly. That query appears to be counting the number of neighbours for the "new resource", grouped by the neighbour's class.

I don't see a need for it to have the subquery to select all of the neighbours, nor do I see a need for using DISTINCT here. To the best of my knowledge, the only way that subquery could produce duplicate results would be if there were 2 duplicate statements of the form ?subject a ?class. I don't believe that Neptune allows for duplicate statements (this should be verified).

Given this, I would expect a query such as this to perform much better and produce equivalent results:

SELECT ?class (COUNT(?class) AS ?count) {
  ?neighbour a ?class .
  { ?neighbour ?p <http://aws.amazon.com/neptune/csv2rdf/resource/414> }
  UNION
  { <http://aws.amazon.com/neptune/csv2rdf/resource/414> ?p ?neighbour }
}
GROUP BY ?class

kmcginnes commented 6 months ago

Possibly related to

184
324

aws / graph-explorer

[Bug] RDF: Adding resource to the canvas is very slow or fails with out of memory #263

184

324