Closed valeriazuccoli closed 4 years ago
How many nodes and edges do you have? It is probably because the graph is very large. There is a timeout of 2000s with the REST protocol by default (you can change it the graphconf.js file) but not with websocket. The 'get graph info' query could be modified to adapt to large graphs.
Thanks for your quick answer!
I have 5.8millions of nodes and 25millions of edges, so yes, my graph is very large.
I just found the config of "Get graph info": it computes multiple .groupCount()
, which probably causes my connection to get stuck.
How can I interrupt the execution of this "Get graph info" query, instead of rebooting my DB?
I used websocket, found as default in html file. Should I use REST to avoid stuck connections? Where can I config REST as default in my project?
One quick fix I see is to add .limit(10)
just before all the .groupCount()
, to limit to the first 10 few nodes and edges (in graphioGremlin.js, lines 56-59). The graph info will be incomplete be at least you are not stuck and you can explore the network.
I have just added a limit to the first 10000 nodes and edges, with a parameter in graphConf.js to change it if needed. Could you tell me if this number is reasonable? How long does it take with this limit?
Thanks for pointing this out!
I added .limit(10000)
to lines 56-59 in graphioGremlin.js
and graphexp returned me useful information in 1-2 minutes.
I tried to re-run "Get graph info", but now it returns this error:
{"requestId":"51379deb-7013-4c7b-8f1f-bbb16cfc8da3","code":"MemoryLimitExceededException","detailedMessage":"Query cannot be completed due to memory limitations."}
Nothing has been changed on the infrastructure or config. What do you think has been filled up?
Note: I have nodes with two labels, and this limitation only returned me info of "label1". Since I have 3.5millions of "label1" and 2.3millions of "label2", this behaviour didn't surprise me.
I suggest using .sample(10000)
instead of .limit(10000)
,:
.limit(X)
takes the top X
nodes.sample(X)
takes a random sample of X
nodes from all of them.tail(X)
takes the bottom X
nodes)I'd also display something like "You're reading info about a random sample of your graph. Some others labels may be available", to underline that info can be incomplete in case of large graphs.
Thank you!
I realized that .limit(10000)
does not return any MemoryError, while .sample(10000)
does. I don't know why, but I think this is a Neptune-related issue and not graphexp-related.
I believe limit
solves the problem, but displaying a message seems necessary in this case. Maybe something like "You're reading info about the top X nodes and vertices of your graph. Some others labels/properties may be available" may be useful.
Thank you very much!
Note: I don't know if there is a cache system, but my "Get graph info" now takes only 10 seconds. Good point!
Thanks @valeriazuccoli for your feedback!
Hi!
I set up an AWS Neptune DB with a big graph inside. I also configured a ssh port forwarding using an EC2. At http://localhost:8182/status I receive this response, as expected:
Then, I made a curl request:
which answered me with the query results.
So, where is the problem? I opened
graphexp.html
and clicked on "Get graph info". Now:The only way I could unlock this situation is stop-start the DB.
Is there a way to suppress graphexp requests after a timeout? It seems requests keep DB busy even if
graphexp.html
returns a timeout message; this way I have no options to interact with the machine.Thanks.