bricaud / graphexp

Interactive visualization of the Gremlin graph database with D3.js
Apache License 2.0
783 stars 216 forks source link

SINGLE_COMMANDS_AND_NO_VARS and limited searches #55

Closed antonyscerri closed 1 year ago

antonyscerri commented 5 years ago

Hi

With certain graphs (particularly big ones) the current implementation of search with a limit on the number of items when in SINGLE_COMMANDS_AND_NO_VARS mode has a problem with the separation of the node from edge queries. The edge query does not impose any limit on the initial node traversal so if you have a lot of matching nodes it then tries to return all the edges between them. This may either cause a timeout depending on your graph database or possibly returning a lot more data than is necessary.

The same limit could be applied to the initial step for selecting the nodes in the edge query. This would require the graph database to make the same limited selection between different queries in a deterministic manner otherwise you may return edges between a different subset of the nodes.

An alternative will be to iterate over the returned nodes and for each one get its out edges and filter on the client to any node matching the search. Depending again on the graph structure this may return a lot of data.

There is one final option i've been experimenting with which is a different construction of a single query but avoiding the use of variables. I'm quite new to Gremlin queries so i may not have all bases covered. I cannot guarantee it will work on all databases and have only done some limited testing of its behaviour. You can try a query that looks something like:

g.V().limit(50).union(aggregate("nodes"),outE().filter(inV().where(within("nodes")))).toList()

Basically you get the initial node traversal steps including the search and limit. Then aggregate to a named set inside of a union followed by the edge selection from that same set. I'm not sure whether you will be guaranteed to get vertices followed by edges in the result set, otherwise you may want to scan the results twice, first for vertices before then adding the edges.

I'll have a PR to upload in a bit which replaces the search and click queries using this type of construction if you'd like to try it out. I've done some limited testing comparing results before and after on a couple of graphs, but i'd be interested if anyone can tell whether this is a workable alternative.

Thanks

Tony