apache / age

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.
https://age.apache.org
Apache License 2.0
2.87k stars 401 forks source link

Problem with the WITH clause: Vertex assigned to variable <x> was deleted. #1954

Open Stephaneke opened 1 week ago

Stephaneke commented 1 week ago

I am working with Apache AGE and I am trying to create new nodes and relationships based on existing nodes and their relationships. Here is a simplified version of my schema and queries. These can be used to reproduce the problem.

I wanted to look for existing relationships and create new nodes and relationships connected to the found nodes. However, it seems to be able to find the nodes, but when adding the WITH statement, it no longer creates nodes and is unable to find them. This works fine in a neo4j context, so i assume this is a bug with the WITH clause.

Schema Setup

preparation script (if age extension does not exist).

CREATE EXTENSION age;
SET search_path = ag_catalog;

pastable setup script:

-- Create the graph
SELECT create_graph('my_graph');

-- Add NodeType1 nodes
SELECT * FROM cypher('my_graph', $$
CREATE (:NodeType1 {name: 'A', project: 'PROJECT_ID'})
CREATE (:NodeType1 {name: 'B', project: 'PROJECT_ID'})
CREATE (c:NodeType1 {name: 'C', project: 'PROJECT_ID'})
return c 
$$) AS (c agtype);

-- Add NodeType2 nodes
SELECT * FROM cypher('my_graph', $$
CREATE (:NodeType2 {name: '1', project: 'PROJECT_ID'})
CREATE (b:NodeType2 {name: '2', project: 'PROJECT_ID'})
return b
$$) AS (b agtype);

-- Add relationships between NodeType1 and NodeType2
SELECT * FROM cypher('my_graph', $$
MATCH (a:NodeType1 {name: 'A', project: 'PROJECT_ID'}), (b:NodeType2 {name: '1', project: 'PROJECT_ID'})
CREATE (a)-[:RelType]->(b)
return a, b
$$) AS (a agtype, b agtype);
SELECT * FROM cypher('my_graph', $$
MATCH (a:NodeType1 {name: 'B', project: 'PROJECT_ID'}), (b:NodeType2 {name: '1', project: 'PROJECT_ID'})
CREATE (a)-[:RelType]->(b)
return a, b
$$) AS (a agtype, b agtype);
SELECT * FROM cypher('my_graph', $$
MATCH (a:NodeType1 {name: 'B', project: 'PROJECT_ID'}), (b:NodeType2 {name: '2', project: 'PROJECT_ID'})
CREATE (a)-[:RelType]->(b)
return a, b
$$) AS (a agtype, b agtype);
SELECT * FROM cypher('my_graph', $$
MATCH (a:NodeType1 {name: 'C', project: 'PROJECT_ID'}), (b:NodeType2 {name: '2', project: 'PROJECT_ID'})
CREATE (a)-[:RelType]->(b)
return a, b
$$) AS (a agtype, b agtype);

Working Query (creates nodes but not relationships)

SELECT * FROM cypher('my_graph', $$
MATCH (node1:NodeType1)-[rel:RelType]->(node2:NodeType2)
WHERE NOT exists((node1)-[:RelType2]-(:NodeType3)-[:RelType2]-(node2)) 
AND node1.project = 'PROJECT_ID' 
AND node2.project = 'PROJECT_ID'
CREATE (newNode:NodeType3 {
    name: node1.name + '|' + node2.name, 
    uri: 'https://example.com/ns#NodeType3', 
    project: 'PROJECT_ID', 
    executionType: 'AUTO'
}) 
RETURN newNode
$$) AS (newNode agtype);

Failing Query (tries to create relationships, but fails)

SELECT * FROM cypher('my_graph', $$
MATCH (node1:NodeType1)-[rel:RelType]->(node2:NodeType2)
WHERE NOT exists((node1)-[:RelType2]-(:NodeType3)-[:RelType2]-(node2)) 
AND node1.project = 'PROJECT_ID' 
AND node2.project = 'PROJECT_ID'
CREATE (newNode:NodeType3 {
    name: node1.name + '|' + node2.name, 
    uri: 'https://example.com/ns#NodeType3', 
    project: 'PROJECT_ID', 
    executionType: 'AUTO'
})
WITH node1 AS n1, node2 AS n2, newNode AS n3
MERGE (n1)-[:RelType2 {uri: 'https://example.com/ns#RelType2'}]->(n3)
MERGE (n2)-[:RelType2 {uri: 'https://example.com/ns#RelType2'}]->(n3)
RETURN n1, n2, n3
$$) AS (n1 agtype, n2 agtype, n3 agtype);

This results in the error:

Vertex assigned to variable n3 was deleted

Alternative Query (without WITH clause, also fails)

SELECT * FROM cypher('my_graph', $$
MATCH (node1:NodeType1)-[rel:RelType]->(node2:NodeType2)
WHERE NOT EXISTS((node1)-[:RelType2]-(:NodeType3)-[:RelType2]-(node2)) 
AND node1.project = 'PROJECT_ID' 
AND node2.project = 'PROJECT_ID'
CREATE (newNode:NodeType3 {
    name: node1.name + '|' + node2.name, 
    uri: 'https://example.com/ns#NodeType3', 
    project: 'PROJECT_ID', 
    executionType: 'AUTO'
})
MERGE (node1)-[:RelType2 {uri: 'https://example.com/ns#RelType2'}]->(newNode)
MERGE (node2)-[:RelType2 {uri: 'https://example.com/ns#RelType2'}]->(newNode)
RETURN node1, node2, newNode
$$) AS (node1 agtype, node2 agtype, newNode agtype);

This also results in the error:

Vertex assigned to variable newNode was deleted

What We Have Tried

None of these approaches have resolved the issue, and we consistently encounter the vertex deletion error.

Stephaneke commented 1 week ago

Update: the following query below does work (so you can see the result). It is generating all the steps in one create. But the WITH statement should just work as intended? So the bug remains.

SELECT * FROM cypher('my_graph', $$
MATCH (node1:NodeType1)-[rel:RelType]->(node2:NodeType2)
WHERE NOT exists((node1)-[:RelType2]-(:NodeType3)-[:RelType2]-(node2)) 
AND node1.project = 'PROJECT_ID' 
AND node2.project = 'PROJECT_ID'
CREATE (node1)-[:RelType2 {uri: 'https://example.com/ns#RelType2'}]->(newNode:NodeType3 {
    name: node1.name + '|' + node2.name, 
    uri: 'https://example.com/ns#NodeType3', 
    project: 'PROJECT_ID', 
    executionType: 'AUTO'
})<-[:RelType2 {uri: 'https://example.com/ns#RelType2'}]-(node2)
RETURN node1, node2, newNode
$$) AS (n1 agtype, n2 agtype, n3 agtype);
MironAtHome commented 1 week ago

A few questions 1 Just reading example queries, thought to ask, are the -[:<rel_type>]- legitimate. I am learning cypher, and, at least in the past, it had to have directed expression where at least one of the sides has -> or <- relationships ( looking at Exists ). I don't think it has any bearing on the question asked, though, just curious regarding cypher expression, hoping on kindness of people to help along the way in more detailed questions on cypher language. 2 It makes sense to check if this graph expression (node1:NodeType1)-[rel:RelType]->(node2:NodeType2) is unique, linking one vertice NodeType1 to one vertice NodeType2 If it is, than the WHERE clause below, save for existance check, is not really needed ( that's the beauty of graph ) and if it is not unique and requires further filtering on PROJECT_ID on both sides of node link, than it makes sense to check node design. 3 I noted you were using node1.name + '|' +node2.name in the property design, but for some reason kept on using 'PROJECT_ID' constant, where node1.project seems most natural. Unless in real life this 'PROJECT_ID' really something more complicated. Just trying to make best sense and see how query itself can be worked with, to make it as good and as efficient as it could be ( again, not the question asked, but improving query design seems like of interest, I hope it is positive :) ) 4 If I understand intent of using MERGE correctly ( with its general motto "when exists, keep, when missing, add" ) to mimik behavior of statements with MERGE this query needs two MATCH clauses independently checking for relationship

(NodeType1)-[]->(NodeType3) (NodeType2)-[]->(NodeType3)

else it has a risk of creating duplicate link, if one of the links is missing, prior to query execution. Also, looking at "MERGE" in more holistic sense of word, it makes sense, when applied to properties of node instance variable. Since than it has an object instance to really MERGE. However, in the context of ()-[]->() relationship it might or might not be semantical concept ( if it makes sense? ), just first thoughts looking at the code. I will check further and if I am wrong somewhere in my thoughts, I am just trailing the path of "unaware", for the sake of seeing input from your side, if these thoughts make sense. Though, as you mentioned, it works in Neo4j, so, something regarding Merge I don't know/understand at the moment.