Doom9527 / aristotle-webservice

👑 A web servide enables you to create knowledge graph.
https://aristotle-ws.com
16 stars 0 forks source link

Support filtering #32

Closed QubitPi closed 1 month ago

QubitPi commented 1 month ago

We have a very simple filtering need at this moment which is being able to filter on node/link attributes on all read-only endpoints. For example, suppose we have 6 nodes in database with 2 custom properties "name" and "hometown"

1. {"name": "Peter", "hometown": "China"}
2. {"name": "Jack", "hometown": "China"}
3. {"name": "Paion Data", "hometown": "China"}
4. {"name": "Tom", "hometown": "U.S."}
5. {"name": "Wilhelm", "hometown": "Germany"}
6. {"name": "Greta", "hometown": "Germany"}

We should be able to get everyone coming from China as

1. {"name": "Peter", "hometown": "China"}
2. {"name": "Jack", "hometown": "China"}
3. {"name": "Paion Data", "hometown": "China"}

We can assume all nodes are in one single graph

QubitPi commented 1 month ago

Implementing filtering is more involved than it seems to be. There needs a design phase that determines what query language at webservice level is utilized. This query language should not be homebrewed if there is already a well-recognized international standard. For example, SPARQL is a good candidate, because it does not tie to any database vendor.

I think this is a process I need to step into, not to ask @Doom9527 to implement this in any specific way, but to discuss with @Doom9527 and come up with a design together. I think my past experience in graph app industry shall help @Doom9527 to find the solution easier.

I'll post more here tomorrow

QubitPi commented 1 month ago

The filtering here is defined as the parametrized filtering of entries in RESTful APIs. It is syntax grammar that can be specified in human readable language in frontend and can be translated to a filtering query against a graph database.

The filtering language must satisfy 2 criteria

  1. The filtering language has to be a well-known specification such as GraphQL spec, JSON API, and RSQL. This has the following advantages:

    • It makes the filtering easy to learn for Aristotle user. Enforcing standards also reduces frictions between team members by a common agreement on how certain technology should be used
    • There will be great tools that parses the filtering query so that we don't have to implement and maintain our own parsing code
    • It covers wide range of filtering scenarios
  2. The filtering language should be able to cover the multiple graph databases and can be translated, at least, to the following vendors:

    • Neo4J (Cypher)
    • ArangoDB (AQL)
    • JanusGraph (Gremlin)

There are 3 major categories of filtering for knowledge graph:

  1. Filtering nodes

    • Filtering by node attribute
    • Filtering by inbound/outbound links
    • Filtering by neighbors
  2. Filtering links (relationships)

    • Filtering by polarity (directed/undirected)
    • Filtering by link attribute
    • Filtering by linked nodes (1)
  3. Filtering paths

    • Filtering by path segment
    • Filtering by start/end node
    • Filtering by number of hops
    • Filtering by link (2)
    • Filtering by node (1)

In this issue, we are only talking about filtering by no attribute. We will only implement his feature for now so that our PR won't be too large. We will eventually deal with the rest.

3 possibilities come up running top down based on how strong it ties to graph database:

  1. SPARQL
  2. RSQL
  3. Spring Bean (Query Object)

I would first throw away Spring Bean for its violation on criteria 1 above.

RSQL is a general query language for WS. What that means is it will maximize the benefit for API users because it covers the widest range of business queries. But translating RSQL to a graph query is not guaranteed to be straightforward, such as the path filtering above

SPARQL on the other hand runs on the opposite against RSQL.

So there is a tradeoff: should we put more burden on the user-side of WS-side. This depends on how we envision Aristotle should be.

My vision for Aristotle is to make it the Elide for graph database. The spirit of it should be to greatly simplify the development of a graph database webservice. This simplicity goes to the user. In that sense, RSQL is my pick

Doom9527 commented 1 month ago

The filtering here is defined as the parametrized filtering of entries in RESTful APIs. It is syntax grammar that can be specified in human readable language in frontend and can be translated to a filtering query against a graph database.

The filtering language must satisfy 2 criteria

  1. The filtering language has to be a well-known specification such as GraphQL spec, JSON API, and RSQL. This has the following advantages:

    • It makes the filtering easy to learn for Aristotle user. Enforcing standards also reduces frictions between team members by a common agreement on how certain technology should be used
    • There will be great tools that parses the filtering query so that we don't have to implement and maintain our own parsing code
    • It covers wide range of filtering scenarios
  2. The filtering language should be able to cover the multiple graph databases and can be translated, at least, to the following vendors:

    • Neo4J (Cypher)
    • ArangoDB (AQL)
    • JanusGraph (Gremlin)

There are 3 major categories of filtering for knowledge graph:

  1. Filtering nodes

    • Filtering by node attribute
    • Filtering by inbound/outbound links
    • Filtering by neighbors
  2. Filtering links (relationships)

    • Filtering by polarity (directed/undirected)
    • Filtering by link attribute
    • Filtering by linked nodes (1)
  3. Filtering paths

    • Filtering by path segment
    • Filtering by start/end node
    • Filtering by number of hops
    • Filtering by link (2)
    • Filtering by node (1)

In this issue, we are only talking about filtering by no attribute. We will only implement his feature for now so that our PR won't be too large. We will eventually deal with the rest.

3 possibilities come up running top down based on how strong it ties to graph database:

  1. SPARQL
  2. RSQL
  3. Spring Bean (Query Object)

I would first throw away Spring Bean for its violation on criteria 1 above.

RSQL is a general query language for WS. What that means is it will maximize the benefit for API users because it covers the widest range of business queries. But translating RSQL to a graph query is not guaranteed to be straightforward, such as the path filtering above

SPARQL on the other hand runs on the opposite against RSQL.

So there is a tradeoff: should we put more burden on the user-side of WS-side. This depends on how we envision Aristotle should be.

My vision for Aristotle is to make it the Elide for graph database. The spirit of it should be to greatly simplify the development of a graph database webservice. This simplicity goes to the user. In that sense, RSQL is my pick

I agree with what you said. We need to parse RSQL into Cypher statements in the backend for filtering. This might be complex, but I believe it is achievable.