FalkorDB / FalkorDB

A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).
https://www.falkordb.com/
Other
601 stars 21 forks source link

Incorrect results triggered by query partitioning #37

Open Lincyaw opened 1 year ago

Lincyaw commented 1 year ago

Version: Docker image falkordb/falkordb:edge OS: Ubuntu 22.04 API/Driver: Cypher

According to the following query, the 18 should equals to 10+12

lab:6380> GRAPH.QUERY 0 "MATCH (n1:L3)-[]-(n2:L2)-[]-(n3:L1:L2)-[r1:T1]-(n4:L3:L0:L4) RETURN COUNT(*)"
1) 1) "COUNT(*)"
2) 1) 1) (integer) 18
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 1.820675 milliseconds"
lab:6380> GRAPH.QUERY 0 "MATCH (n1:L3)<-[]-(n2:L2)-[]-(n3:L1:L2)-[r1:T1]-(n4:L3:L0:L4) RETURN COUNT(*)"
1) 1) "COUNT(*)"
2) 1) 1) (integer) 10
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 0.738668 milliseconds"
lab:6380> GRAPH.QUERY 0 "MATCH (n1:L3)-[]->(n2:L2)-[]-(n3:L1:L2)-[r1:T1]-(n4:L3:L0:L4) RETURN COUNT(*)"
1) 1) "COUNT(*)"
2) 1) 1) (integer) 12
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 0.759969 milliseconds"
lab:6380> GRAPH.QUERY 0 "MATCH (n)-[r]-(n) RETURN n,r LIMIT 10"
1) 1) "n"
   2) "r"
2) (empty array)
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 0.584519 milliseconds"

The create statement is attached below for your reproduction 1692914926.9295375.log

gkorland commented 1 year ago

@Lincyaw you're right the semantic is not well defined in Cypher, counting different overlapping paths on the first example is considered as a single result for the count.

So in your example, two directed paths that have the same components will be count as a single path.

You can overcome this by counting unique paths.

GRAPH.QUERY 0 "MATCH p=(n1:L3)-[]-(n2:L2)-[]-(n3:L1:L2)-[r1:T1]-(n4:L3:L0:L4) RETURN COUNT(p)"
Lincyaw commented 12 months ago
image

You are right, but for the same queries, the neo4j's results seems different from the FalkerDB. Is that normal?

gkorland commented 12 months ago

Yes, the semantic is not well defined in Cypher so you might find different vendors interpenetration

Lincyaw commented 11 months ago

After searching the Cypher Specification, I found the following use cases.

https://github.com/opencypher/openCypher/blob/master/tck/features/clauses/match/Match8.feature

And in the following test case

 Scenario: [2] Counting rows after MATCH, MERGE, OPTIONAL MATCH
    Given an empty graph
    And having executed:
      """
      CREATE (a:A), (b:B)
      CREATE (a)-[:T1]->(b),
             (b)-[:T2]->(a)
      """
    When executing query:
      """
      MATCH (a)
      MERGE (b)
      WITH *
      OPTIONAL MATCH (a)--(b)
      RETURN count(*)
      """
    Then the result should be, in any order:
      | count(*) |
      | 6        |
    And no side effects

The behavior of Neo4j is exactly the same as the expected result, but in falkerDB, the result is different.

image