aplbrain / grand-cypher

Implementation of the Cypher language for searching NetworkX graphs
Apache License 2.0
79 stars 8 forks source link

Support for Equijoins #38

Open felix-boeseler opened 8 months ago

felix-boeseler commented 8 months ago

Hello, thank you for the great project :)

In how far are equijoins exactly supported?

Given that I have the following NetworkX graph:

G = nx.DiGraph()
G.add_node("x")
G.add_node("y")
G.add_node("z")
G.add_edge("x", "y")
G.add_edge("y", "x")
G.add_edge("x", "x")
G.add_edge("z", "x")

When I execute the following query:

MATCH (n)-->(n)
RETURN n

I get the result:

{Token('CNAME', 'n'): ['x', 'y']}

However, if I execute the same query on the equivalent graph in neo4j, I only get the node x as result - which to my understanding of Cypher would be the correct result.

Therefore, to my understanding, equijoins are currently only supported in the project when they are distributed over multiple match clauses and do not recognizes self cycles on nodes. Is that correct?

For instances, the following query correctly recognizes two loops in the graph:

MATCH (n)-->(m)
MATCH (m)-->(n)
RETURN n, m

While neo4j additionally returns n=x and m=x as a result.

Many regards, Felix

j6k4m8 commented 8 months ago

Uh oh! Looks like a bug to me. Your intuition is totally right, we should only be returning x in your first example. Thank you for reporting!!

j6k4m8 commented 7 months ago

@khoale88 sorry to bug you, wondering if you have thoughts on this one — I think the issue is here:

https://github.com/aplbrain/grand-cypher/blob/31001b7c54d9ac35325042fc59673d2a99271acf/grandcypher/__init__.py#L434-L463

On line 470, self._matches is [{'n': 'x'}, {'n': 'y'}] — so the issue is happening I guess at the pathwise where filtering... wondering if you encountered anything like this while testing. I'm still poking at it and I'll report back here as I learn more!

khoale88 commented 6 months ago

Right, let me spend some time on this!

j6k4m8 commented 6 months ago

No pressure, but it would be super appreciated if you have time!! :)

khoale88 commented 5 months ago

I think there is an unexpected behavior in grandiso. After debugging, it drills down to grandiso.find_motif_iter yielding both x and y, Here is the test snippet.

def test_equijoins(self):
    host = nx.DiGraph()
    host.add_node("x")
    host.add_node("y")
    host.add_node("z")
    host.add_edge("x", "y")
    host.add_edge("y", "x")
    host.add_edge("x", "x")
    host.add_edge("z", "x")

    motif = nx.DiGraph()
    motif.add_edge("n", "n")

    res = find_motifs(motif, host)
    assert res == [{"n": "x"}]

The test fails with result

>       assert res == [{"n": "x"}]
E       AssertionError: assert [{'n': 'x'}, {'n': 'y'}] == [{'n': 'x'}]
E         Left contains one more item: {'n': 'y'}
E         Full diff:
E         - [{'n': 'x'}]
E         + [{'n': 'x'}, {'n': 'y'}]

What do you think?