Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt

dlutz2 commented 7 years ago

In trying to transition from using Neo4j as a local store to a remote store using neo4j-gremlin-bolt, we have noticed that neo4j-gremlin-bolt behaves differently from Tinkergraph and Neo4j (embedded) when creating vertices and these vertices becoming visible in the (in memory) graph. The attached code illustrates the difference. NeoBoltTest.txt

The expected behavior is that at the end of the loop, the in-memory graph contains 10 vertices, which is what is seen with Tinkergraph and Neo4j (embedded). With neo4j-gremlin-bolt, the vertex added does not seem to be immediately visible in the graph. The vertices do appear in the db and a second run of the code pulls them back but otherwise the behavior is the same.,
I have tried various combinations of try-with-resource, commit() and close(). The only combination I have found that produces the expected behavior is to commit() and then close() the graph after every addVertex(), which seems odd (although there is an issue with Tinkerpop regarding the vagueness of the semantics of close(): https://issues.apache.org/jira/browse/TINKERPOP-789 )

Perhaps I have missed something?

rjbaucells commented 7 years ago

The neo4j-gremlin-bolt driver does not send any CYPHER statements to the server (CREATE, MERGE) while the transaction is open, it does a flush on commit updating the server with the in-memory changes.

dlutz2 commented 7 years ago

I think the difference is in the in-memory behavior e.g. adding a vertex and then immediately checking to see if that vertex is there before doing a commit. Does that mean that any in-memory changes are not reflected in the in-memory graph until they have been committed and somehow "brought back" in memory?

rjbaucells commented 7 years ago

If you add a Vertex it will be in memory only until you commit. There is no need to fetch the vertices after commit, they will stay in memory until the Graph is closed.

dlutz2 commented 7 years ago

That is the behavior I expected but what I see with this code:

for (int i = 0; i < 10; i++) {
  System.out.println("Before " + + g.V().toList().size());
  Vertex v = graph.addVertex(label);
  System.out.println("After " +  + g.V().toList().size() + "\n");
}

Is that the size of the in memory graph is always 1 (after the first addVertex). If I commit after the loop, 10 instances show up in Neo. If I run it a second time, the loop is always 11 (after the first addVertex), 20 instance show up in Neo. The behavior with commits is what I expect, its the behavior of the in memory graph between instantiation and commits which is confusing me and which differs from the Tinkergraph and Neo4j embedded. Am I doing something embarrassingly wrong here?

dlutz2 commented 7 years ago

Forgot to mention I was using the Neo4JNativeElementIdProvider. Poking around in the code, I see that Neo4JSession.addVertex() creates the vertex (new Neo4JVertex(...) ) and then registers the vertex in Neo4JSession.vertices:

vertices.put(vertex.id(), vertex);

but using a Neo4JNativeElementIdProvider that vertex id will always be null at this point (since it hasn't been committed yet), so every new vertex overwrites the preceding ones in the vertices Map. I also see that the vertex is add to transientVertices but I can't tell if transient vertices participate in any use of Traversals. These transient vertices are then committed. This is consistent with the behavior I see where only the last vertex added is seen in the in-memory graph but all vertices get committed. Also consistent with the observed behavior that previously committed vertices (which do have Ids) are seen in the graph on subsequent runs.

rjbaucells commented 7 years ago

Yes, I was aware of that problem and I was working on a solution for the native id provider. I have a local branch with the fix but I still need to work on the unit tests, some of the Tinkerpop tests are failing after the fix.

dlutz2 commented 7 years ago

Thanks very much.

fenallen commented 7 years ago

Just found this after much head scratching. I am falling foul of exactly the same issue. Looking forward to the fix.

rjbaucells commented 7 years ago

I have just released v0.2.23 with the fix for this bug. Wait for the release to be available on Maven Central

dlutz2 commented 7 years ago

Excellent. Thanks for the quick response. I'll try it out as soon as it appears

dlutz2 commented 7 years ago

Tried this out. Although it fixed the issue with adding vertices to the graph, newly created (uncommitted) vertices still have null ids, which causes null pointer exceptions when used in a traversal, e.g.

 Vertex v = graph.addVertex(label);
  g.V(v.id()).toList(); // should find newly created vertex

I think the underlying issue is that ids have to be assigned at vertex creation. Would you like me to open a new issue for this?

rjbaucells commented 7 years ago

This is the behavior for the native id provider, I do not think there is much we can do in this case. IDs will be generated on the database after a CREATE statement is executed, something we can only do once the API consumer explicitly commit the transaction. Trying to execute CREATE statements without knowing all required fields have been set is not possible for the moment.

dlutz2 commented 7 years ago

Is this due to a difference in behavior/capability between the Bolt driver API and the embedded Neo4J API? The embedded implementation of Tinkerpop handles creates and transactions independently i.e. it creates a Neo4j vertex on graph.AddVertex() which may or may not get committed later.

rjbaucells commented 7 years ago

Embedded driver will create an actual Element on addVertex() so it will be able to assign an identifier at that time, BOLT driver requires a CREATE statement execution (database is on the other side of the network) and an explicit RETURN clause to retrieve the identifier. We cannot issue the CREATE statement until all required properties are populated, action that is indicated by an explicit commit call from the API consumer. Change the ID provider to one that allocates the identifier on node creation by implementing Neo4JElementIdProvider.generate()

SteelBridgeLabs / neo4j-gremlin-bolt

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52