arangodb / arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
https://www.arangodb.com
Other
13.59k stars 837 forks source link

Graph traversal `vertexCollections` option to allow specifying at which depth to apply #21277

Open rk-1337 opened 2 months ago

rk-1337 commented 2 months ago

Hello Team,

If we could specify at which depth the vertexCollections to traverse, that will help to filter out collections at different level and have the desired path which we have to achieve it using a PRUNE filter later in the query. From my understanding the PRUNE still needs to traverse all the nodes and apply the filter whereas vertexCollections will remove them altogether beforehand reducing the traversal paths.

so instead of vertexCollections: ["VertexB", "VertexC"]

something like this

vertexCollections: [
    {
        "depth": [1],
        "vertexCollections": ["VertexB"]
    },
    {
        "depth": [2],
        "vertexCollections": ["VertexC"]
    }
]

Kindly provide your insights if the query is already optimized to handle such cases from PRUNE.

Thanks

Simran-B commented 3 weeks ago

You are right that PRUNE has to perform filtering after traversing to the vertices in question. However, if you use PRUNE to stop the traversal down specific paths, there won't be further traversing and filtering on these paths. This is unlike FILTER in most cases, where the full traversal is carried out and then vertices are post-filtered (there are a few edge cases where FILTER can stop the traversal similar to PRUNE). Sorted by efficiency, the best but also the most limiting option is vertexCollections, then PRUNE, then FILTER.

Adding support for per-depth vertexCollections would be possible but I suspect that it can't provide a real speedup. Excluding collections entirely from a traversal means the traversal engine doesn't need to perform any setup for these collections - and this is mainly what makes it more efficient - but if it were per-depth, all collections would require setup. The only difference would be the piece of code that applies the filtering I think.

To a limited extent, named graphs allow you to define different subsets of a graph comprised of multiple vertex and edge collections. For each edge collection, there can only be a single set of vertex collections, but this exact relation (edge collection + the set of vertex collections) can be used in another named graph, together with other relations. Depending on your use case, it might be possible to take advantage of this in order to limit what vertex collections are involved by choosing the respective named graph.