Super Node Series One: Introduce proxy node type

li-boxuan commented 4 months ago

This is a first attempt towards solving a famous problem in graph databases world - super node. A brief introduction to super node problem, and the status quo of JanusGraph, is documented at https://github.com/JanusGraph/janusgraph/discussions/2717. In a nutshell, JanusGraph has already partially addressed the traversal performance issue, but not the memory/storage issue.

Without native support of super node in JanusGraph (and many other graph databases), a lot of users end up building their own workarounds. Probably the most popular approach is to create "meta" vertices, or "proxy" vertices, and let the application layer redistributes the traversals & data, which is cumbersome and error-prone.

This PR aims to introduce proxy nodes, which work like the partition node before, except that proxy vertices are created only on-demand. There are two basic requirements:

1) User intervention should be as minimal as possible. 2) This feature shall bring zero or minimal overhead when there's no super node.

Note that the existing (but discouraged) partition node (a.k.a. vertex cut feature), addresses the 1st problem well but performs very bad at the 2nd requirement.

It takes a few steps to fully address 1st requirement, and this PR only addresses it partially. This PR requires users to explicitly create proxy nodes, connect them with the canonical node, and EXPLICITLY connect edges to the proxy nodes. This mostly fulfills the 2nd requirement: when there's no need, don't introduce new overhead. The drawback is that the write path is pretty cumbersome, but the good news is that, the read path offers seamless experience. Users could do normal traversal queries as if proxy nodes don't exist.

The brief design is as follows: let's say A is a super node, and A connects to a number of vertices with different labels. Let's say we have a proxy node for A, Vpa. We store proxies, an array of IDs, as a vertex property in the canonical node Va. Conversely, we store canonicalId, the ID of Va as a vertex property in Vpa. Every time we need to traverse from Va, we always fetch Vpa, and do the traversal from there. Let's say Va -.-.-.-> Vpa -------> Vb, then when we traverse from Vb, we will find Vpa first, and then we retrieve Va because Vpa is just a proxy for Va.

TODOs in this PR:

[ ] Fix test
[ ] Support custom-type vertex ID
[ ] Make it configurable
[ ] Benchmark
[ ] Add doc

TODOs in subsequent PRs:

[ ] Add traversal strategy such that user doesn't need to explicitly connect edges to proxy nodes
[ ] Automatically create proxy nodes (experimental)

C.C. @dxtr-1-0 @rngcntr who expressed interests in this project

Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

For all changes:

[ ] Is there an issue associated with this PR? Is it referenced in the commit message?
[ ] Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
[ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
[ ] Is your initial contribution a single, squashed commit?

For code changes:

[ ] Have you written and/or updated unit tests to verify your changes?
[ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
[ ] If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
[ ] If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

[ ] Have you ensured that format looks appropriate for the output in which it is rendered?

li-boxuan commented 4 months ago

This work was created by me long time ago (before I joined my current company). Looks like to do any non-trivial contribution to open-source projects, I need to register. Thus, I'll close this PR for now and get back to it once I get approval.

li-boxuan commented 3 months ago

Yeah it looks like I can continue contributing to JanusGraph.

JanusGraph / janusgraph