Graphexp breaks connection to AWS Neptune

valeriazuccoli commented 4 years ago

Hi!

I set up an AWS Neptune DB with a big graph inside. I also configured a ssh port forwarding using an EC2. At http://localhost:8182/status I receive this response, as expected:

{"status":"healthy","startTime":"Tue Apr 14 16:34:44 UTC 2020","dbEngineVersion":"1.0.2.2.R2","role":"writer","gremlin":{"version":"tinkerpop-3.4.3"},"sparql":{"version":"sparql-1.1"},"labMode":{"ObjectIndex":"disabled","ReadWriteConflictDetection":"enabled"}}

Then, I made a curl request:

curl 'http://localhost:8182/gremlin' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'Accept: */*' -H 'Sec-Fetch-Dest: empty' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36' -H 'DNT: 1' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' -H 'Origin: http://0.0.0.0:9000' -H 'Sec-Fetch-Site: cross-site' -H 'Sec-Fetch-Mode: cors' -H 'Referer: http://0.0.0.0:9000/graphexp.html' -H 'Accept-Language: en-US,en;q=0.9,it;q=0.8' --data '{"gremlin":"g.V().limit(1).valueMap()"}' --compressed

which answered me with the query results.

So, where is the problem? I opened graphexp.html and clicked on "Get graph info". Now:

graphexp tells me "(loading)"
curl requests are pending
gremlin queries from the EC2 are pending

The only way I could unlock this situation is stop-start the DB.

Is there a way to suppress graphexp requests after a timeout? It seems requests keep DB busy even if graphexp.html returns a timeout message; this way I have no options to interact with the machine.

Thanks.

bricaud commented 4 years ago

How many nodes and edges do you have? It is probably because the graph is very large. There is a timeout of 2000s with the REST protocol by default (you can change it the graphconf.js file) but not with websocket. The 'get graph info' query could be modified to adapt to large graphs.

valeriazuccoli commented 4 years ago

Thanks for your quick answer!

I have 5.8millions of nodes and 25millions of edges, so yes, my graph is very large. I just found the config of "Get graph info": it computes multiple .groupCount(), which probably causes my connection to get stuck.

How can I interrupt the execution of this "Get graph info" query, instead of rebooting my DB?

I used websocket, found as default in html file. Should I use REST to avoid stuck connections? Where can I config REST as default in my project?

bricaud commented 4 years ago

One quick fix I see is to add .limit(10) just before all the .groupCount(), to limit to the first 10 few nodes and edges (in graphioGremlin.js, lines 56-59). The graph info will be incomplete be at least you are not stuck and you can explore the network.

bricaud commented 4 years ago

I have just added a limit to the first 10000 nodes and edges, with a parameter in graphConf.js to change it if needed. Could you tell me if this number is reasonable? How long does it take with this limit?

Thanks for pointing this out!

valeriazuccoli commented 4 years ago

I added .limit(10000) to lines 56-59 in graphioGremlin.js and graphexp returned me useful information in 1-2 minutes. I tried to re-run "Get graph info", but now it returns this error:

{"requestId":"51379deb-7013-4c7b-8f1f-bbb16cfc8da3","code":"MemoryLimitExceededException","detailedMessage":"Query cannot be completed due to memory limitations."}

Nothing has been changed on the infrastructure or config. What do you think has been filled up?

Note: I have nodes with two labels, and this limitation only returned me info of "label1". Since I have 3.5millions of "label1" and 2.3millions of "label2", this behaviour didn't surprise me.

I suggest using .sample(10000) instead of .limit(10000),:

.limit(X) takes the top X nodes
.sample(X) takes a random sample of X nodes from all of them
(.tail(X) takes the bottom X nodes)

I'd also display something like "You're reading info about a random sample of your graph. Some others labels may be available", to underline that info can be incomplete in case of large graphs.

Thank you!

valeriazuccoli commented 4 years ago

I realized that .limit(10000) does not return any MemoryError, while .sample(10000) does. I don't know why, but I think this is a Neptune-related issue and not graphexp-related.

I believe limit solves the problem, but displaying a message seems necessary in this case. Maybe something like "You're reading info about the top X nodes and vertices of your graph. Some others labels/properties may be available" may be useful.

Thank you very much!

Note: I don't know if there is a cache system, but my "Get graph info" now takes only 10 seconds. Good point!

bricaud commented 4 years ago

Thanks @valeriazuccoli for your feedback!

bricaud / graphexp

Graphexp breaks connection to AWS Neptune #90